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AN  ANAI^YTIC  MODEL  OF  PHYSICAL  DATABASES 


Abstract 


Physical  databases  are  decomposable  into  a  collection  of  simple  files  and 
linksets.  Simple  files  are  structures  that  organize  records  of  a  file.  Linksets  are 
structures  that  link  records  of  one  file  to  those  of  another.  Classical  simple  files 
include  hash  based,  unordered,  and  indexed  sequential  files;  classical  linksets 
include  inverted  lists,  ring  lists,  and  parent  pointers.  Using  classical  structures 
as  a  basis,  unifying  models  of  simple  files  and  linksets  are  developed.  Together 
these  models  serve  as  powerful  tools  for  describing  a  wide  spectrum  of  physical 
databases. 

Primitive  file  and  linkset  operations  are  identified  and  cost  equations  for 
these  operations  are  developed.  These  operations  are  then  augmented  with 
Pidgin  ALGOL  constructs  so  that  database  transactions  can  be  modeled.  By 
applying  simple  statement-expression  translation  rules  to  a  transaction,  an 
expression  estimating  the  cost  of  processing  the  transaction  can  be  derived.  In 
this  way,  the  task  of  analyzing  transactions  may  be  simplified. 

To  supplement  the  above,  a  unifying  model  of  file  evolution  is  proposed. 
The  model  explains  and  predicts  the  evolution  of  certain  file  statistics,  such  as 
the  average  length  of  an  overflow  chadn,  as  records  are  inserted  and  deleted 
from  a  file.  Such  statistics  are  indispensable  when  accurate  estimates  of  a  file’s 
performance  over  extended  periods  of  time  are  desired.  Applications  of  the 
model  to  hash  based,  indexed  sequential,  and  B+  tree  files,  among  others,  have 
been  validiatcd  by  simulation  studies. 

The  simple  file,  linkset,  transaction,  and  file  evolution  models  collectively 
define  an  analytic  model  of  physical  databases.  This  composite  model  is  shown 
to  unify  and  generalize  many  former  works.  Specifically,  new  results  concerning 
the  problems  of  structure  selection  and  index  selection  are  presented.  Also,  a 
new  method  is  proposed  for  solving  the  combined  problems  of  file  design  and  file 
reorganization. 
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CHAPTER  1.  INTRODUCTION 


Performance  is  an  important  concern  in  database  design.  Over  the  last  ten 
years,  a  wide  variety  of  studies  have  contributed  to  the  understanding  eind  arti¬ 
culation  of  many  major  design  issues  in  files  and  databases.  Such  studies  have 
addressed  hash  based  files  ([Van73],  [SeDu76]),  B+  trees  ([KeLa74],  [NaMi78]), 
transposed  files  ([Kofl'75],  [MaSe77],  [Nia78],  [Bat79]),  batched  searching 
([ShGo76]),  performance  evolution  of  files  as  records  are  inserted  and  deleted 
([Van73],  [NaMi78]),  index  selection  ([King74],  [YuWo75],  [Schk75],  [AnBe77]), 
file  reorganization  ([Shn73j,  [YTD76],  [LoMu77j.  [Tuel78]),  multifile  query  pro¬ 
cessing  (['WoYo76].  [BlEs77],  [Yao79]),  dynamic  hash  based  files  ([Lar78],  [Lit7B], 
[FNPS79]),  generalized  aecess  path  structures  ([Haer7B]),  diflerential  files 
([SeLo76]),  and  searching  multilist  files  ([ClYa7B]).  Due  to  the  disparity  of  these 
works,  a  global  understanding  of  how  these  works  and  design  issues  relate  is 
lacking.  Such  understanding  may  be  gadned  through  a  unifying  model. 

Prior  to  the  appearance  of  most  of  these  works,  a  number  of  important  stu¬ 
dies  contributed  toward  a  unifying  model.  Hsiao  and  Harary  ([HsHa70])  were 
among  the  first  to  describe  a  spectrum  of  file  structures  by  a  small  collection  of 
parameters.  Severance  ([Sev72])  extended  this  approach  by  introducing  the 
concept  of  pointer  vs.  sequential  linkages  in  describing  file  implementations. 
Yao  ([Yao74].  [Yao77])  significantly  generalized  Severance’s  work  by  identifying 
file  structures  with  directed  trees.  Unfortunately,  these  models  cannot  be  used 
to  study  certain  important  problems  in  database  design,  nor  can  they  embrace 
recent  contributions  in  file  design.  This  is  due  primarily  to  their  early 
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conception  and  imprecise  formulation. 

In  this  thesis,  a  framework  Is  presented  for  an  analytic  model  of  physical 
databases  (ie.,  storage  mappings  of  data)  which  synthesizes  many  former  works. 
This  framework  is  proposed  as  a  means  for  unifying  the  study  of  database  per¬ 
formance. 

1.1.  Concepts  for  a  Model  Framework 

Physical  databases  are  large  networks  of  interconnected  files.  In  order  to 
provide  a  way  of  describing  them  simply,  the  notion  of  physical  database  decom¬ 
position  is  introduced.  Physical  databases  can  be  decomposed  into  a  collection 
of  simple  files  (or  primitive  files)  and  linksets.  A  simple  file  is  a  structure  that 
organizes  records  of  a  single  file.  Classical  simple  file  structures  include  hash 
based,  indexed  sequential,  B+  trees,  and  unordered  files.  A  linkset  is  a  struc¬ 
ture  that  connects  records  of  one  file  to  those  of  another.  Classical  linkset  struc¬ 
tures  include  parent  pointers,  inverted  lists,  ring  lists,  and  cellular  multilists. 
Since  simple  files  and  linksets  are  not  difficult  to  model,  decomposition  provides 
an  attractive  means  for  describing  the  structure  of  physical  databases. 

A  primary  goal  of  database  performance  models  is  to  estimate  the 
efficiency  of  operations  which  are  to  be  executed.  In  the  context  of  decomposi¬ 
tion,  it  is  necesseiry  to  identify  basic  operations  which  are  performed  on  simple 
files  and  linksets,  and  to  develop  expressions  that  estimate  the  cost  of  perform¬ 
ing  these  operations.  As  a  consequence,  long  and  complicated  navigational 
paths  through  a  physical  database  can  be  described  in  terms  of  a  sequence  of 
simple  file  and  linkset  operations.  Once  expressions  for  estimating  the  cost  of 
each  of  these  operations  are  determined,  it  is  possible  to  estimate  the  cost  of 
traversing  navigational  paths. 

To  support  claims  of  generality,  a  unifying  model  of  physiced  databases  is 
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developed.  The  proposed  model  is  a  composition  of  four  distinct  but  related 
models.  They  are:  1)  a  model  of  simple  files,  which  concerns  the  modeling  of 
simple  files  and  operations  on  these  files;  2)  a  model  of  file  evolution,  which 
assesses  the  impact  of  insertions  and  deletions  on  file  performance;  3)  a  model 
of  linksets,  which  concerns  the  modeling  of  linkset  structures  and  operations  on 
these  structures;  and  4)  a  model  of  transactions,  which  models  operations  on 
physical  databases  in  terms  of  simple  file  and  linkset  operations.  Related 
through  decomposition,  these  four  models  form  the  core  of  an  analytic  model  of 
physical  databases. 

The  models  of  simple  files,  file  evolution,  linksets,  and  transactions  are 
developed  in  successive  chapters.  Collectively,  these  models  are  used  to  study 
new  problems  and  to  generalize  earlier  work.  Specifically,  in  Chapter  6,  a  new 
method  is  proposed  for  solving  the  combined  problems  of  file  design  and  file 
reorganization.  Previously,  these  problems  had  been  addressed  separately. 
New  results  concerning  the  problems  of  structure  selection  and  index  selection 
eu-e  also  presented. 
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CHAPTER  2.  A  MODEL  OF  SIMPLE  nLES 


A  simple  file  is  a  structure  that  organizes  records  of  a  single  file.  Classical 
simple  file  structures  include  hash  based,  indexed  sequential,  B+  trees,  and 
unordered  files. 

In  this  chapter,  a  model  of  simple  files  is  presented.  We  begin  by  defining  a 
unifying  model  of  simple  file  structures.  We  then  identify  basic  operations  that, 
are  performed  on  simple  files,  and  develop  expressions  that  estimate  the  cost  of 
executing  these  operations. 

2.1  Structure  of  a  Simple  File 

The  basic  component  of  a  file  structure  is  a  node  which  contains  zero  or 
more  records.  The  records  of  a  node  are  stored  in  one  or  more  blocks,  where 
one  block  is  designated  as  the  primary  block  and  the  remaining  blocks  are 
overflow  blocks.  Records  that  are  stored  in  overflow  blocks  are  overflow  records. 
The  overflow  records  of  a  node  are  connected  in  a  linear  list  fashion,  where  the 
head  of  the  list  is  stored  in  the  node’s  primary  block  (Fig.  2.1).  The  primary 
block  of  a  node  contains  only  records  belonging  to  that  node.  It  is  possible,  how¬ 
ever,  for  an  overflow  block  to  contain  records  from  different  nodes. 

Simple  file  structures  are  modeled  as  uniform  height,  directed  trees  called 
file  structure  access  trees  (FSAT).  A  vertex  of  an  FSAT  corresponds  to  a  node 
(Fig.  2.2).  If  a  node  is  drawn  as  a  box  (  □  ).  then  the  node  contains  no  records  in 
overflow.  If  a  node  is  drawn  as  a  box  with  a  tail  (CIK)-©),  then  the  node  contains 
one  or  more  records  in  overflow.  The  FSAT  of  Figure  2.2  depicts  a  simple  file 
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Primary  Block  Overflow  Blocks 


Figure  2.1  Structure  of  a  Node 
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Figure  2.2  A  File  Structure  Access  Tree 
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structure  which  has  three  (leaf)  nodes  that  possess  overflow  records. 

Two  major  subdivisions  of  an  FSAT  are  the  nodes  of  the  leaf  level  and  non¬ 
leaf  levels,  here  called  the  base  file  and  clzLsier  index.  All  records  of  a  data  file 
are  stored  in  the  base  file.  Records  of  the  base  file  and  cluster  index  are, 
respectively,  base  file  records  and  cluster  index  records 

Two  important  properties  of  simple  file  structures  and  FSATs  are  1)  the  out- 
dcgrcc  of  a  nonlcaf  vertex,  which  is  the  number  of  cluster  index  records  stored 
in  the  node  represented  by  that  vertex,  and  2)  the  pointer  in  each  cluster  index 
record  on  level  i,  which  points  to  a  distinct  node  on  level  i-1.  These  pointers  are 
the  outgoing  arcs  of  vertices  in  an  FSAT  (cf.,  [Yao77]). 

A  purpose  of  the  cluster  index  is  to  facilitate  the  efficient  accessing  of  base 
file  records.  This  is  accomplished  with  the  aid  of  a  key  called  the  cluster  key, 
which  is  present  in  all  base  file  records  and  every  cluster  index  record  which  is 
not  stored  in  the  root  node.  When  a  record  is  accessed,  each  consecutive  level 
of  the  cluster  index  provides  a  way  of  restricting  a  search  to  only  that  portion  of 
the  file  that  contains  the  desired  record.  Interpreting  this  process  as  a  parti¬ 
tioning  of  a  data  space  of  linearly  ordered  cluster  keys,  each  cluster  index 
record  defines  the  boundaries  of  an  interval  of  cluster  keys.  This  is  realized  by 
storing  in  each  cluster  index  record  the  highest  valued  cluster  key  of  the 
record’s  corresponding  interval.  Consequently,  each  node  of  a  simple  file  is 
identified  with  a  cluster  key  interval,  and  all  records  stored  in  that  node  possess 
cluster  keys  belonging  to  that  interval  ([Yao77]).  The  simple  file  structures  of 
Figures  2.3  and  2.4  illustrate  these  ideas. 

Only  records  of  the  root  node  need  not  contain  cluster  keys.  If  cluster  keys 

*  Let  A  be  an  attribute  and  v  be  a  value  which  can  be  assigned  to  A.  A  key  is  an  ordered 
pair  (A,  v)  which  is  used  to  locate  zero  or  more  records  of  a  file  that  have  v  as  Uieir  A  value .  If 
a  designated  attribute  assumes  distinct  values  for  each  record  in  the  file,  then  a  key  based 
on  this  attribute  is  an  identifier.  Common  usage  of  the  terms  key  and  Identifier  either  refer 
to  the  key's  attribute  or  its  value.  Context  in  general  clarifies  the  meaning. 
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arc  present,  the  simple  file  structure  is  said  to  have  a  root  index;  otherwise  it 
has  a  root  directory.  A  root  directory  is  simply  eui  array  of  one  or  more  pointers 
to  the  nodes  of  the  level  immediately  below  the  root.  Structures  with  a  root 
directory  and  rout  index  are  shown  in  Figures  2.3  and  2.4. 

A  third  property  of  simple  file  structures  is  the  ability  to  access  all  nodes  of 
the  base  file  without  accessing  nonroot  nodes  of  the  cluster  index.  This  is  real¬ 
ized  by  an  address  sequential  or  pointer  sequential  linking  of  the  primary  blocks 
of  the  base  file  nodes.  A  pointer  to  the  first  base  file  node  is  retained  in  the  sim¬ 
ple  file  structure’s  root  node.  The  sequential  linking  of  the  root  node  and  base 
file  nodes  is  indicated  by  the  dashed  arcs  in  an  FSAT  (see  Fig.  2.2). 

2.1.1  Parameterization  of  Simple  Files 

Three  important  objectives  in  modeling  simple  file  structures  are  recogniz¬ 
ing  file  structure  design  strategies,  acquiring  selected  file  data,  and  obtaining 
storage  and  block  access  cost  figures.  Formalizing  these  objectives  results  in 
design,  file,  and  cost  parameters. 

Design  parameters.  The  cluster  key  is  the  key  on  which  a  file  is  organized. 
There  are  three  identifiable  types  of  cluster  keys:  1)  a  logical  valued  key 
corresponds  to  a  key  v.'-hich  explicitly  exists  in  a  base  file  record,  2)  a  hash  key 
corresponds  to  an  algebraic  transformation  of  a  logical  valued  key,  and  3)  a 
relative  location  key  is  a  key  used  in  unordered  files  to  indicate  a  record’s  posi¬ 
tion  relative  to  the  start  of  the  file  (eg.,  the  ith  record  of  the  file).  Let  Ck  denote 
the  type  of  cluster  key  used  in  a  simple  file. 

In  estimating  record  retrieval  costs,  it  is  useful  to  know  the  "granularity”  of 
a  simple  file’s  cluster  key  data  space.  Let  Maxk  indicate  this  granularity: 

Maxk  —  \/{maximum  number  of  distinct  cluster  keys) 
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Figure  2.3  An  Indexed  Sequential  Fil 
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Figure  2.4  A  Dynamic  Hash  Based  File 


For  many  structures,  the  maximum  number  of  cluster  keys  is  so  large  that,  to  a 
good  approximation,  Maxk=0.  In  contrast,  common  hash  based  files  [SeDu76] 
have  Waxk>0.  For  example,  each  hash  key  of  a  hash  based  file  is  assigned  to  a 
distinct  bucket,  so  Maxk  =  {number  of  huckets)~^. 

Other  design  parameters  are  defined  in  Table  2.1.  Values  for  Ri,  Mri,  and 
Roi  are  specified  for  each  level  of  an  FSAT. 

File  parameters  assume  statistically  determined  values  which  characterize 
record  populations  of  base  file  and  cluster  index  nodes.  Table  2.1  defines  a  set 
of  file  parameters  that  assume  (possibly  different)  values  at  each  of  the  L+1  lev¬ 
els  of  an  FSAT.  Level  0  designates  the  base  file. 

Cost  parameters  deal  with  economic  considerations  of  storing  and  accessing 
blocks  on  external  devices.  Costs  associated  with  storing  and  accessing  data  in 
main  memory  are  assumed  negligible.  Table  2.1  defines  salient  cost  parame¬ 
ters. 

A  simple  file  structure  is  described  by  the  values  assigned  to  the  parame¬ 
ters  of  Table  2.1.  This  collection  of  values  is  called  the  file’s  descriptor. 

Not  all  value  assignments  to  simple  file  design  parameters  describe  imple¬ 
mentations  that  are  meaningful.  Those  assignments  that  correspond  to  recog¬ 
nized  structures  are  listed  in  Table  2.2. 
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Design  Parameters 


Ck 


Maxk 


Kindex 


Split 

A  ^  ^  ^  ^ 


Ri 


Rci 

Mr^ 


cluster  key  type:  logical  valued,  hash,  relative  location 
1/ (maximum  number  of  distinct  cluster  keys) 
structure  has  a  root  index  (=l)  or  root  director}^  (=0) 

structure  accommodates  file  growth  by  node  splitting  (=1)  or  overflow  (=0) 

records  vfithin  base  file  nodes  maintained  in  ascending  or  descending 
logical  valued  key  order  (=1)  or  not  (=0) 

record  capacity  of  a  primary  block  on  level  i 

record  capacity  of  an  overflow  block  on  level  i  (default  value  is  l) 
minimum  record  occupancy  of  a  primary  block  on  level  i  (default  value  is  0) 


Fxle  Parameters 

N  number  of  records  in  file 

Zi  number  of  nodes  on  level  i 

L  height  of  structure’s  FSAT 

Hi  exoected  number  of  records  in  a  orimarv  block  on  level  i 

Gi  exoected  length  of  an  overflow  chain  on  level  i 

fc  1.  LJ 

Di  expected  number  of  overflow  records  accessed  •when  locating  a  single 

record  within  a  node  on  level  i 

Pfulli  probability  that  wdien  a  record  is  to  be  inserted  into  a  node  on  level  i, 
the  node’s  primary  block  has  no  vacant  record  slots 

PoxLi  fraction  of  nodes  that  have  overflowed  on  level  i,  or  probability 

that  a  node  of  level  i  will  underflow  when  a  record  is  deleted  from  it 

PmeVi  probability  that  when  a  level  i  nude  underflows,  a  merging 
of  tw^o  nodes  res'ults 


Cost  Parameters 

S.  So  storage  cost  of  a  primary  and  overflow  block  per  unit  time 
A,  Ao  access  cost  of  a  primary  and  overflow  block 


Tabic  2.1.  Par  ameters  of  a  File  Descriptor 
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Generic  Structure 
Name 

Generic  Structure  Descriptor  Values 

Ck  Split  Ascend  Rindex  L  (typical)  , 

indexed  aggregate 

0 

0 

0 

* 

^2 

sequential 

0 

0 

1 

0 

1 

i 

indexed  sequential 

0 

0 

1 

* 

i 

B+  tree 

0 

1 

1 

m 

^2 

1 

hash  based 

1 

0 

0 

1 

1  i 

dynamic  hash  based 

1 

1 

0 

1 

1 

2  1 

1 

unordered 

2 

0 

0 

• 

^1  ' 

Note:  *  =  Rindex  may  assume  the  values  0  or  1 


Table  2.2  A  Catalog  of  Simple  File  Structures 


2.1.2  Examples 

Indexed  sequential  files  use  overflow  to  accommodate  file  growth,  and  main¬ 
tain  records  sorted  on  ascending  (logical  valued)  cluster  keys.  .An  indexed 
sequential  structure  consists  of  a  fixed  number  of  base  file  nodes  whose  primary 
blocks  are  stored  sequentially  in  secondary  storage.  Typical  implementations 
equate  primary  blocks  to  tracks,  so  that  each  level  of  a  cluster  index  takes  on  a 
physical  significance  (eg.,  level  1  =  track  index  and  level  2  =  cylinder  index 
([IBM76],  [Yao77])).  Figure  2.3  illustrates  an  indexed  sequential  file  with  its 
descriptor. 

A  variant  of  indexed  sequential  designs  is  the  indexed  aggregate  file,  which 
has  the  same  design  parameter  values  as  an  indexed  sequential  file  except  that 
Ascend=0  (ie.,  records  within  a  node  are  not  maintained  in  key  order).  Such 
structures  are  well  suited  for  network  database  implementations  (see  Section 
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3.2.2). 


Dynamic  hash  based  files  use  hash  keys  to  store  and  locate  records,  and  use 
node  splitting  to  accommodate  file  growth.  The  dynamic  hash  based  file  that  is 
examined  in  this  thesis  is  based  on  the  structure  of  [FNPS79],  and  is  illustrated 
in  Figure  2.4.  The  nodes  of  level  1  form  the  directory.  ^  Since  a  directory  dou¬ 
bles  in  size  each  time  a  node  within  the  directory  overflows,  the  number  of 
nodes  in  a  directory  is  a  power  of  two.  These  nodes  are  stored  in  consecutive 
secondary  storage  locations.  When  a  record  is  to  be  retrieved,  its  hash  key  is 
used  1)  to  index  into  the  directory,  and  2)  to  locate  the  pointer  to  the  base  file 
node  which  contains  the  record.  Additional  details  about  this  structure  are 
given  in  Sections  2.3.3  and  2.3.4. 

Unordered  files  are  the  simplest  of  file  structures.  New  records  are 
appended  to  the  end  of  the  file,  records  to  be  deleted  are  marked  deleted  and 
are  not  removed.  Nodes  of  unordered  files  do  not  have  overflow  blocks.  Figure 
2.5  shows  an  unordered  file  whose  base  file  nodes  are  stored  in  consecutive 
secondary  storage  locations.  Figure  2.6  shows  an  unordered  file  that  is  similar 
to  IBM’s  ESDS  structure  ([IBM76])  where  nodes  need  not  be  stored  sequentially. 

Hash  based  files  use  overflow  to  accommodate  file  growth  (Fig.  2.7).  A  hash 
based  structure  consists  of  a  fixed  number  of  base  file  nodes,  called  buckets, 
where  each  bucket  is  identified  by  a  hash  key.  The  structure  of  Figure  2.7  uses 
separate  chaining  to  handle  overflow  records  (ie.,  overflow  records  of  a  bucket 
are  linked  in  a  linear  list  fashion).  Although  the  simple  file  model  restricts  our 
study  to  structures  that  use  separate  chaining,  note  that  other  overflow^ 
methods,  such  as  open  overflow  ([SeDu76]),  have  been  used. 

2 

Note  that  the  directory  of  the  dynairic  hash  based  file  of  Figure  2.4  is  ircplemented  in  a 
slightlj'  different,  but  equivalent,  way  than  the  directories  in  [FNPS79],  [Lar78],  and  [litVB]. 

The  main  distinction  is  that  there  is  precisely  one  cluster  index  record  that  contains  a 
pointer  to  any  given  base  file  node.  The  structure  of  [FNPS79],  for  example,  allows  multiple 
pointers. 
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Figure  2.5  An  Unordered  File 
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Figure  2.6  Another  Unordered  File 
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Figure  2.1  k  Hash  Based  File 
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Figure  2.8.  A  B+  Tree 


As  a  final  example,  a  B+  tree  is  illustrated  in  Figure  2.B.  B+  trees  maintain 
records  sorted  on  ascending  (logical  valued)  cluster  keys  and  use  node  splitting 
to  accommodate  file  grovrth.  These  trees  are  similar  to  IBM’s  VSAM  ([IBM76]).  ^ 

2.2  Operations  on  Simple  Files 

Operations  on  a  simple  file  are  actions  directed  toward  one  or  more  of  its 
records.  An  integral  step  in  the  execution  of  these  operations  is  to  locate 
desired  records.  Wc  begin  our  discussion  of  simple  file  operations  with  a  review 
of  basic  relationships  between  queries  and  file  searching  strategies. 

2.2.1  Queries  and  File  Searching  Strategies 

A  record  qualification  predicate,  called  a  query,  consists  of  a  number  of 
clauses  of  the  form: 

{Attribute  p  value  )  where  p  is  {  <,  =,  ^ 

and 

{value\  ppi  Attribute  ppz  valuer)  where  ppiis[  <,  ^  ^ 

Clauses  of  a  query  are  connected  by  the  operators  AND  and  OR  to  form  a  predi¬ 
cate  in  disjunctive  normal  form.  The  res-porise  set  of  a  query  is  the  set  of 
records  that  satisfy  the  query. 

A  response  set  is  determined  by  accessing  base  file  records  according  to  a 
file  structure  traversal  technique  called  a  search  strategy.  Search  strategies 
locate  desired  records  by  combinations  of  cluster  index  and  base  file  traversals. 
Four  popular  search  strategies  are  the  cluster  key  search,  scan,  partial  scan, 
and  range  search.  As  a  diagrammatic  aid  to  describing  these  strategies,  a  node 
traversal  sequence  characteristic  of  each  strategy  is  illustrated  on  separate 
FSATs  (Fig.  2.9). 

o  ~ 

Note  that  3-trees  ([BaMc72])  cannot  be  described  as  FSATs  since  data  fUe  records  eire 
stored  over  several  levels. 
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Figure  2.9  Simple  File  Search  Strategies 
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A  cluster  key  search  is  used  to  process  queries  that  specify  the  cluster  key 
of  each  record  of  the  response  set.  The  search  primarily  involves  a  cluster 
index  traversal.  Figure  2.9a  shows  a  cluster  key  search  for  2  records.  Note  that 
only  those  nodes  which  are  relevant  to  processing  the  query  are  accessed. 

A  scan  accesses  every  record  in  the  base  file.  Figure  2.9b  shows  that  a  scan 
starts  at  the  root  node,  followed  by  the  access  of  all  base  file  nodes.  This  is  pos¬ 
sible  since  the  root  node  contains  a  pointer  to  the  first  base  file  node  and  the 
primary  blocks  of  the  base  file  nodes  are  linked  together.  A  scan  is  used  to  pro¬ 
cess  queries  when  the  size  of  the  response  set  is  unknown,  and  cluster  keys  do 
not  qualify  records. 

When  the  (maximum)  size  of  a  response  set  is  known,  and  cluster  keys  do 
not  qualify  records,  a  partial  scan  is  used.  A  partial  scan  is  similar  to  the  scan, 
except  that  the  search  terminates  as  soon  as  the  response  set  has  been 
retrieved  (Fig.  2.9c).  Partial  scans  are  used  primarily  to  process  queries  that 
specify  an  identifier  of  each  record  of  the  response  set. 

A  range  search  is  used  to  process  queries  that  require  cluster  keys  of 
response  set  records  to  belong  to  a  specified  range  of  values.  The  strategy 
involves  a  cluster  index  traversal  to  locate  the  record  with  the  low^est  valued 
cluster  key  of  the  range,  followed  by  a  base  file  traversal  to  access  all  other 
records  satisfying  the  range  predicate.  The  search  terminates  w'hen  no  other 
base  file  nodes  contain  response  set  records  (Fig.  2.9d). 

An  important  way  to  recognize  how  to  process  a  query  is  to  classify  the 
query  by  its  characteristics.  Table  2.3  classifies  a  query  according  to  the  type  of 
clauses  that  appear  in  all  conjunctions.  Each  class  is  associated  with  a  search 
strategy  that  is  appropriate  for  processing  such  queries. 
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Each  Conjunction  of  a  Query 
Contains  a  Clause  of  the  Form 


Search  Strategy  where  the 
Number  of  Conjunctions  in  a  Query  is: 


1 


2  or  more 


(cluster  key  =  value) 


cluster  key  search  cluster  key  search 


{valuBi  ^  cluster  key  ^  valuez)  range  search 


scan 


(identifier  =  value) 


partial  scan 


partial  scan 


otherwise 


scan 


scan 


Table  2.3.  A  Query  -  File  Search  Strategy  Relationship 


To  illustrate,  consider  the  query  involving  employee  numbers  and  jobcodes: 

{EMPif  =  ^d)AND  {JOBCODE  =  50)  OR  {EMPj^  >  3000)  AND  {JOBCODE  =  75) 

Suppose  JOBCODE  is  the  cluster  key  of  the  file.  Since  both  conjunctions  of  the 
query  have  a  clause  of  the  form  (cluster  key  =  value),  a  cluster  key  search  is  the 
recommended  processing  strategy.  ^  Had  EMPj^  been  the  cluster  key,  the  recom¬ 
mended  processing  strategy  is  a  scan  since  each  conjunction  of  the  query  has  a 
clause  of  the  form  {value  ^  ^  cluster  key  ^  vaiucg)-  An  alternative  method  is  to 
process  each  conjunction  of  the  query  separately.  Table  2.3  recommends  the 
first  conjunction  to  be  processed  by  a  cluster  key  search,  and  the  second  con¬ 
junction  by  a  range  search.  Table  2.3  does  not  indicate  when  it  is  economical  to 
process  conjunctions  of  a  query  separately. 

The  cluster  keys  that  are  most  often  encountered  in  queries  are  logical 
valued  keys.  Relative  location  keys  occasionally  appear  in  queries,  but  hash 
keys  are  never  specified.  Given  in  place  of  a  hash  key  is  a  logical  valued  key 
from  which  a  hash  key  can  be  derived.  Consequently,  to  use  Table  2.3  may 

^  If  JOSCODE  is  em  identifier,  it  is  also  correct  to  say  that  the  conjimctions  contain  clauses 
of  the  form  (identifier  =  value),  where  a  partial  scan  would  be  recommended.  However, 
this  interpretation  ignores  certain  properties  of  the  query  and  causes  the  selection  of  a  less 
efficient  processing  .strategy 
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require  translating  logical  valued  key  clauses  into  hash  key  clauses.  In  general, 
(logical  valued  key  =  value)  translates  into  (hash  key  =  value)  while  all  other  log¬ 
ical  valued  key  clauses  have  no  translation.  For  example,  had  EMP^  been 
hashed,  the  recommended  strategy  would  be  a  scan. 

A  query  may  be  described  by  the  search  strategy  used  to  process  it,  and  by 
estimates  of  the  size  of  the  query’s  response  set.  Such  information  is  specified 
by  values  assigned  to  those  parameters  whose  definitions  are  listed  in  Table  2.4. 
This  collection  of  values  defines  the  query’s  descriptor.  Methods  of  estimating 
values  of  a  query  descriptor  are  considered  in  Section  2.3.1. 


Ss  search  strategy  used  to  process  a  query:  cluster  key  search, 
range  search,  partial  scan,  or  scan 

f  selectivity  of  a  query:  expected  fraction  of  a  file  that  satisfies 
the  quer)'^ 

ef  exact  selectivity  of  a  query:  exact  fraction  (or  upper  bound  to  the 
exact  fraction)  of  a  file  that  satisfies  the  query 

kf  cluster  key  selectivity:  expected  fraction  of  cluster  keys  that 
satisfy  the  query 


Table  2.4.  Parameters  of  a  Query  Descriptor 

2.2.2  Basic  Operations 

Basic  simple  file  operations  involve  record  retrieval,  insertion,  deletion,  and 
modification.  These  operations  can  be  envisioned  as  procedures  which  return 
zero  or  more  records  as  their  output.  Functions  that  characterize  these  opera¬ 
tions  are  cost  functions  and  response  set  size  functions.  A  cost  function  of  an 
operation  estimates  the  cost  of  performing  the  operation;  a  response  set  size 
function  estimates  the  number  of  records  that  are  output  by  the  operation.  In 
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the  following  paragraphs,  we  will  identify  five  simple  file  operations  and  define 
their  characteristic  functions. 

Let  F  be  a  simple  file,  F-Query  be  a  query,  and  x  be  a  record.  A  general  for¬ 
mat  of  a  retrieval  operation  is: 


RETRIEVE  F 


(i4  1  •  Ah) 


WHERE  F-Query 


HOLD 


where  A^  -  •  •  A^.  are  attributes  whose  values  are  extracted  from  each  record  of  F 
that  satisfies  F-Query.  The  values  extracted  from  a  record  forms  an  output 
record,  and  the  collection  of  all  output  records  forms  an  output  file.  Duplicate 
output  records  are  retained.  The  square  brackets  enclose  phrases  that  need  not 
be  specified.  OrnilLing  "(A  i  •  •  'Ah)"  implies  all  attribute  values  are  desired: 
omitting  "WHERE  F-Query"  implies  all  records  are  to  be  retrieved.  Specifying 
HOLD  indicates  that  the  records  which  are  retrieved  may  be  subject  to 
modification  or  deletion  ([Date??]).  A  more  detailed  discussion  of  HOLD  appears 
in  Section  5.3. 

An  insertion  operation  has  the  form: 


INSERT  X  INTO  F 


HOLD 


The  output  of  an  INSERT  is  the  record  x.  The  significance  of  an  output  record 
and  the  purpose  of  HOLD  will  be  explained  shortly. 


Deletion  operations  assume  two  forms.  One  is: 


DELETE  F  WHERE  F-Query 


HOLD 


Each  record  of  F  that  satisfies  F-Query  is  output  and  then  deleted  when  a 
DELETE  is  executed. 

Subsequent  to  an  INSERT  x  INTO  F  or  prior  to  the  deletion  of  some  record  x 
in  F,  it  is  common  for  x  to  be  linked  to  or  unlinked  from  records  of  other  files. 
In  order  to  perform  linking  operations,  a  correspondence  between  a  user’s  copy 
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of  X  and  x  in  F  must  be  established.  This  correspondence  is  realized,  in  part,  by 
the  output  record(s)  of  INSERT  and  DELETE.  Such  correspondence  is  discussed 
more  fully  in  Section  5.3.  Specifying  HOLD  in  an  INSERT  and  DELETE  indicates 
that  the  output  record(s)  of  these  operations  are  subject  to  record  linking  and 
unlinking. 

It  is  often  the  case  that  records  are  located  more  efficiently  by  means  other 
than  file  searching  (eg.,  navigating  through  DBTG  files  or  using  inverted  lists).  In 
such  cases  deleting  records  with  the  DELETE  operation  may  be  inefficient.  An 
alternative  format  for  a  deletion  operation  is: 

REMOVE  X  FROM  F 

where  x  is  a  record  of  F  that  has  been  previously  accessed.  Unlike  DELETE,  a 
REMOVE  has  no  output. 

A  record  update  operation  takes  the  form: 

UPDATE  X  IN  F 

The  purpose  of  an  UPDATE  is  to  substitute  a  user’s  copy  of  record  x  for  the 
record  x  that  currently  exists  in  F.  An  UPDATE  has  no  output. 

Prior  to  issuing  a  REMOVE  or  an  UPDATE,  a  correspondence  between  a 
user’s  copy  of  record  x  and  x  in  F  must  be  established  (see  Section  5.3). 

Cost  functions  and  response  set  size  functions  accept  the  descriptors  F  and 
Q  of  F  and  F-Query  as  arguments.  Table  2.5  lists  the  above  simple  file  operations 
with  their  characteristic  functions.  From  the  descriptions  of  the  INSERT, 
REMOVE,  and  UPDATE  operations,  we  know  that  nINS=l,  nREM=0,  and  nUPD=0. 
Expressions  that  estimate  other  characteristic  functions  are  developed  in  the 
following  sections. 
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cost 

response  set  size 

simple  file  operation 

function 

function 

RETRIEVE  .F 
HOLD 


(i4  1  •  •  ‘  A ji) 


HOLD 


WHERE  F-Query 


INSERT  X  INTO  F 
DELETE  F  WHERE  F-Query 
REMOVE  X  FROM  F 
UPDATE  X  IN  F 


HOLD 


RET(F,  Q) 
INS(F) 
DEL(F.  Q) 
REM(F) 
UPD(F) 


nRET(F..  Q) 
nINS 

nDEL(F.  Q) 
nREM 
nUPD 


Table  2.5  Characteristic  Functions  of  Simple  File  Operations 


2.3  Cost  Expressions  for  Simple  Files 

Important  relationships  and  useful  statistics  can  be  defined  in  terms  of  the 
parameters  of  a  file  descriptor.  Let: 

Z.;  =  number  of  nodes  on  level  i  =  i  ■  (H.  +  G^-) 

**  ±  JL.  J  J  ' 

5=1  +  1 

and 


N  =  number  of  records  in  file  =  + 

*•  — n 

j  -u 

and 


Nk  =  maximum  number  of  distinct  cluster  keys  in  a  file 

=  l/max(l/N,Maxk) 


Using  standard  technique?,  it  is  not  difhcuit  to  develop  precise  expressions 
for  estimating  costs  of  file  storage  and  performing  file  opera.tions.  The  storage 
cost  of  a  simple  file  is  determined  by  estimating  the  number  of  primary  and 
overflow  blocks  the  file  will  occupy,  and  multiplying  this  volume  of  blocks  times 
the  storage  cost  per  block  per  unit  time.  For  a  simple  file  wdLh  descriptor  F,  the 
storage  cost  function  is: 
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L-^ 

STOR(F)  =  ^  (#  of  primary  blocks  on  level  i)S 

i=0 

+  (#  of  overflow  blocks  on  level  i)So 


i-i 

i=0 


ZjGj 


5*0 


2.3.1  Estimating  Query  Descriptors 

Suppose  a  query  consists  of  a  single  clause.  Let  <f,ef,kf>  be  the  predicate 
descriptor  of  the  clause,  where:  1)  the  clause's  selectivity,  f,  is  the  expected 
fraction  of  the  file  that  is  qualified  by  the  clause.  If  the  file  contains  N  records, 
approximately  f  y-N  records  satisfy  the  clause.  2)  ef  is  the  clause’s  ezac^  selec¬ 
tivity.  Typically,  ef=f  if  the  number  of  qualified  records  is  knoivn  precisely,  oth¬ 
erwise  ef=l.  In  general,  only  clauses  of  the  form  (identifier  =  value)  have  ef<l. 
3)  kf  is  the  clause’s  cluster  key  selectivity.  kf=f  if  the  key  of  the  clause  is  the 
cluster  key,  or  kf=l/Nk  if  it  is  possible  to  translate  the  clause  into  (hash  key  = 
value),  otherwise  kf=l. 

To  estimate  the  predicate  descriptor  of  a  query  consisting  of  disjunctions 
and  conjunctions  of  clauses,  we  assume  the  independence  of  records  satisfying 
the  clauses  of  the  query.  ®  Thus,  if  and  Cj  are  clauses  and  and  fj  their 
selectivities,  the  selectivity  of  Ci  AND  Cj  is  /{y/j  and  the  selectivity  of  Q  OR  Cj 
is  The  following  are  rules  for  applying  these  estimates: 

<f\.  e/i>  kf\>  and  Kfn,  c/g,  fc/2>  =  ef^ye/z,  kfiykf2> 

<f\,  e/i.  kf{>  OR  </2.  e/a.  fc/2>  = 

e/,-l-e/2-e/,xe/2.  kf  ^-^kf  z-^f  \'>^kf 2> 


®  Independence  assumptions  are  commonly  used  to  simplify  analyses.  However,  approxima¬ 
tions  based  on  these  assvimptions  are  not  always  accurate.  See  [ChrBl]. 


26 


To  illustrate  their  use,  suppose  we  are  given  the  query: 


(EMP#  =  40)  OR  (JOBCODE  =  40)  Ai\D  (AGE>35) 


and  the  predicate  descriptors  of  its  clauses: 


iku’  4’ 


(Note  that  EMP#  is  assumed  to  be  a  cluster  key  and  an  identifier).  Applying  the 
above  rules,  we  obtain  an  estimate  of  the  query’s  predicate  descriptor: 

< — - —  +  — - - - .  1,  1> 

1000  200  200000 


Using  Table  2.3  and  the  knowledge  that  JOBCODE  and  AGE  are  not  identifiers,  the 
search  strategy  used  to  process  the  query  is  a  scan  (ie.,  Ss="scan'’).  Combining 
the  values  of  a  query’s  predicate  descriptor  <f,ef,kf>  and  processing  strategy  Ss 
specifies  the  values  of  the  query's  descriptor  (see  Table  2.4). 


For  practical  considerations,  it  is  of  value  to  note  that  predicate  descrip¬ 
tors  of  clauses  need  not  be  used  if  it  is  simpler  to  estimate  the  predicate 
descriptor  of  a  query  directly. 


2.3.2  RETRIEVE 

A  retrieval  cost  function  is  a  collection  of  expressions,  where  each  expres¬ 
sion  estimates  the  cost  of  processing  a  query  using  a  different  search  strategy. 
Retrieval  costs  are  determined  by  estimating  the  number  of  primary  and 
overflow  blocks  accessed  in  processing  a  query,  and  multiplying  this  volume  of 
blocks  times  the  access  cost  per  block:  ® 


RET(F,  Q) 


SC  if  Ss  =  scan 

PS  {efy-N)  if  tSs  =  partial  scan 

RS{hfy-N)  if  5*5  -  range  search 

KS  {kf  y.hlk,min{ef,kf)y.N)  if  5's  =  cluster  key  search 


8 


It  is  worth  noting  that  oxir  cost  estimates  do  not  reflect  parallelism  of  I/O. 
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Expressions  for  SC,  PS,  RS,  and  KS  can  be  progressively  developed  in  the  follovr- 
ing  way.  Assume  all  records  have  an  equal  probability  of  being  requested. 
A  +  AoGi  is  the  expected  cost  of  accessing  all  records  of  a  node  on  level  i.  For  b 
nodes: 


NSC(b,i)  =  cost  of  scanning  b  nodes  on  level  i 
=  b(A+AoGi) 

SC  =  cost  of  scanning  the  base  file 
=  NSC(Zo.  0) 

The  cost  of  locating  a  single  record  within  a  node  on  level  i  is  A  +  Aofli. 
Generalizing: 


NPSl(b,i)  =  cost  of  applying  a  partial  scan  to  b  nodes  on  level  i  to  locate 
1  record 


=  total  cost  of  accessing  each  record  individually 

total  number  of  records 

=  t  (ffi  +  Gi)(NSC  (j-l.i)+A  +AoQi) 

+  Gi) 

=  A  +  AoQ^  +  ("— -)(A  + 

NPS(r,b,i)  =  cost  of  applying  a  partial  scan  to  b  nodes  on  level  i  to  lO' 
cate  r  records  ^ 


=  NPSl(b,i)  +  (^^^)(NSC(b.i)  -  NPSl(b,i)) 

r  +  1 

.A+Aofiv.  .  /b  +  lv^ 

=  2( - rr-^)  +  (^  +.4oC,)(b-(-^)) 


r  + 1 


r +  1 


PS(r)  =  cost  of  a  partial  scan  of  the  base  file  to  locate  r  records 
=  NPS(r,Zo.O) 


A  range  search  involves  the  retrieval  of  a  series  of  r  records  from  one  or  more 
consecutive  base  file  nodes.  The  search  begins  with  a  cluster  index  traversal  to 

y 

Assuming  the  r  records  are  randomly  distributed  over  b  nodes,  the  expected  fraction  of 
records  in  the  b  nodes  that  are  examined  is  approximately  r/(r+i).  Hence  a  nonlinear  inter¬ 
polation  between  NPSl  end  NSC  (ie..  the  costs  for  r=l  and  r=b(/fi  +  G^))  is  in  order. 
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locate  the  node  that  contains  the  first  record  in  the  series,  followed  by  a  base 
file  traversal.  Let: 


ITC  =  cost  of  a  cluster  index  tra%xrsal  for  a  range  search 


Cn(r) 


^J]NPS{1,  l.i)  + 

<=i 


NPSit  l.L~l) 
NPS{l,Zi.i,L-l) 
0 


if  Rindex  =  1,  /.  >  / 
if  Rindex  =0,  Z  >  1 
if  Z  =  1 


number  of  consecutive  base  file  nodes  that  can  contain  r 
records 


r 

Ha+Gn 


BTC(r)  =  cost  of  a  base  file  traversal  for  Cn(r)  nodes 


2  ’  if //  =  !  and /?index  =  0 


NSC{Cn(r).  0) 


othenvise 


For  the  case  where  L=1  and  Rindex=0,  base  file  nodes  must  be  sequentially 
accessed.  Consequently,  in  order  to  access  the  Cn(r)  nodes,  all  nodes  that 
preceed  them  must  also  be  accessed.  Concluding, 


RS(r)  =  cost  of  a  range  search  to  access  r  consecutive  records 
=  ITC  +  BTC(r) 

To  estimate  the  cost  function  KS,  let: 


^(b,r,n)  =  expected  number  of  distinct  nodes  that  are  accessed  when 
randomly  retrieving  r  records  from  a  file  of  n  records  stored 
in  b  nodes,  n/b  records  per  node 

=  6(1  -  (1  -  i)"/*)  (2.2) 

n 
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Xi  =  X(k,r,i)  =  expected  number  of  nodes  accessed  on  level  i  (or 
equivalently,  the  number  of  relevant  records  to  locate  on  lev¬ 
el  i+l)  during  a  cluster  key  search  for  r  records  given  k  clus¬ 
ter  keys,  where  k^r  ® 

^{Zi.k,Nk)  Q^i^L 
~  r  i=-l 

KS(k,r)  =  cost  of  a  cluster  key  search  for  r  records  given  k  cluster 
keys,  k^r 

=  y]Xi>^NPS(Xi.i/Xi.  1,  i) 

1  — w 

Xi-ry^NPS 1'  -^”1)  if /?i7iciex  =  1 
NPS{Xl-z>  Xi-\,  L  —  1)  if /binder  =  0 

Note  that  if  a  root  directory  exists,  a  partial  scan  of  the  nodes  of  level  L-1  is 
required  in  order  to  perform  a  cluster  key  search  on  the  remaining  levels. 

The  number  of  records  that  are  returned  by  a  PiETRIEVE  is  estimated  by  the 
product  of  the  query  selectivity  and  file  size: 

nRET{Y.q)  =  /xA' 

Cost  and  response  set  size  estimates  for  insertion,  update,  deletion,  and  removal 
operations  are  developed  analogously  in  the  following  sections. 

2.3,3  INSERT 

Record  insertion  involves  a  cluster  index  traversal  to  locate  the  node  in 
which  to  store  a  record,  followed  by  a  storage  operation.  From  the  previous  sec¬ 
tion,  the  estimated  cost  of  an  index  traversal  is  ITC.  In  the  following  case  ana¬ 
lyses,  we  assume  that  the  record  to  be  inserted  has  an  equal  probability  of  pos¬ 
sessing  any  given  cluster  key. 

Case  1.  Ck=[logical  valued.  hash|,  Split=0,  Ascend=0  (Hash  Based  and  Indexed 
Aggregate  files) 

In  hash  based  files,  for  example,  m-ciny  records  may  have  the  same  hash  key. 
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The  cost  of  locating  and  accessing  the  primary  block  of  the  node  in  which  a 
record  is  to  be  inserted  is  ITC+A.  With  probability  1-Pfull^,  the  record  is 
inserted  into  a  vacant  record  slot  of  the  primary  block,  followed  by  a  write  of  the 
primairy  block.  With  probability  Pfullo,  there  are  no  vacant  record  slots,  so  the 
record  will  be  placed  at  the  head  of  the  overflow  chain.  This  involves  a  read  and 
a  write  of  an  overflow  block  (to  store  the  record)  plus  a  write  of  the  primary 
block  (to  point  to  the  new  head  of  the  overflow  chain).  The  cost  of  a  record 
insertion  is  therefore: 

^^^hash^aggrvgatci^)  -  ITC  +  A  +  (l-P/uiio)xA  +  P/uii oX(A+2xAo) 

=  ITC  +  2x(A  +  PfulloxA)  (2.3) 


Case  2.  Ck=relative  location,  Split=0,  Ascend=0  (Unordered  files) 

Records  are  appended  to,  rather  than  inserted  into,  unordered  flies.  With  a 
cost  of  ITC+A,  the  last  node  (ic.,  primary  block)  of  an  unerdered  file  is  located 
and  accessed.  With  probability  (l-Pfullo),  the  record  is  appended  to  the  file  and 
the  primary  block  is  written  out.  With  probability  P/uUq,  a  new  base  file  node  is 
created  to  contain  the  appended  record.  The  additional  node  is  integrated  into 
the  unordered  file  by  appending  a  cluster  index  record  onto  the  last  cluster 
index  node  on  level  1.  This,  in  turn,  may  cause  further  node  creations  at  levels 
i>0.  Let  ACi  be  the  cost  of  appending  a  record  on  level  i: 


ACi 


(l-Pfulli)^  +  PfulliX{A+ACi^i)  i<L  (2.4a) 

Ax(l-Rindex)  i=L  (2.4b) 


(2.4b)  indicates  that  if  the  unordcred  file  has  a  root  index,  a  cluster  index 
record  may  be  appended  at  level  L  without  incurring  additional  block  write 
costs.  The  estimated  cost  of  a  record  append  in  an  unordered  file  is: 


INS^nordercdi^)  =  +  AC  q 


(2.5) 
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Case  3.  Ck= logical  valued,  Split=0,  Ascend=l  (Indexed  Sequential) 

The  cost  of  locating  a  record’s  insertion  point  in  an  indexed  sequential  file 
is  ITC4-NPS(  1,1,0).  Suppose  that  on  insertion,  the  cluster  key  of  a  nevr  record 
falls  inbetween  any  adjacent  pair  of  existing  cluster  keys  with  equal  probability. 
Therefore  with  probability  {l-PfullQ)'><.{HQ-¥PQUo)/{HQ’¥GQ),  the  record  is 
inserted  into  a  vacant  slot  in  the  primary  block,  followed  by  a  write  of  the  pri¬ 
mary  block.  With  probability  P full Q-k-PouQ)/^}! q-¥G q) ,  a  record  is  removed 
from  the  primary  block  to  make  room  for  the  newl}’-  inserted  record.  The 
removed  record  becomes  the  first  record  in  the  node’s  overflow  chain.  This 
requires  a  read  and  a  write  of  an  overflow  block  (to  store  the  record)  plus  a 
write  of  the  primary  block  (to  point  to  the  new  head  of  the  overflow  chain). 

With  probability  {Gq-Pouq)/{H q-^-Gq),  the  record  will  be  inserted  into  the 
overflow  chain.  This  requires  at  most  a  read  and  a  write  of  an  overflow  block  (to 
store  the  record)  and  a  write  of  a  previously  accessed  overflow  block  (to  update 
the  overflow  chain  linkages).  The  estimated  cost  of  a  record  insertion  for  an 
indexed  sequential  file  is: 

INSinaej^sBqi^)  =  rrC  +  NPS(l.l,0)  +  ((^o+/'tmo)x((l-^/itiio)x^ 

(2.6) 

+  PfulLo'>^(A+2xAo)  )  +  (Co-P’ouo)x3x/4o)/(//c+Cc) 

Case  4.  Ck= logical  valued,  Split=l,  Ascend=l  (B+ Trees) 

The  cost  of  locating  and  accessing  the  primary  block  of  the  base  file  node  in 
which  a  record  is  to  be  inserted  is  ITC+A,  W^ith  probability  I-P/uUq,  the  record 
will  be  inserted  into  a  vacant  slot  of  the  primary  block,  followed  by  a  write  of  the 
block.  With  probability  P/uUq,  the  node  will  split.  This  causes  two  primary 
blocks  to  be  written,  plus  additional  block  writes  caused  by  the  insertion  of  a 
cluster  index  record  on  level  1.  The  index  record  insertion  may  cause,  in  turn, 
other  index  record  insertions  at  higher  levels.  Let  ICi  be  the  cost  of  inserting  a 


32 


record  on  level  i; 


ICi 


{l-Pfulli)xA  +  Pfulli>^{2xA  +  ICi+i)  i<L  (2.7a) 

Ax.{l-Rindex)  i=L  (2.7b) 


(2.7b)  indicates  that  if  the  B+  tree  file  has  a  root  index,  an  index  record  may  be 
appended  at  level  0  without  incurring  additional  block  writes.  The  estimated  cost 
of  inserting  a  record  into  a  balanced  tree  is: 

INSB^tr^A^)  =  ITC  +  A  +  /Cc  (2.8) 


Case  5.  Ck=hash,  Sp]it=l,  A5cend=0  (Dynamic  Hash  Based  Files) 

The  cost  of  locating  and  accessing  the  primary  block  of  the  base  file  node  in 
which  a  record  is  to  be  inserted  is  ITC+A.  With  probability  I-P/uUq,  the  record 
will  be  inserted  into  a  vacant  slot  of  the  primary  block,  followed  by  a  >\Tite  of  the 
block.  With  probability  Pfullo,  the  base  file  node  splits.  This  causes  twn  pri¬ 
mary  blocks  to  be  written,  plus  additional  block  ivrites  caused  by  the  insertion 
of  a  cluster  index  record  on  level  1. 

Given  that  a  node  split  occurs,  with  probability  {1-Pfulli),  the  cluster 
index  record  is  inserted  into  a  vacant  record  slot,  followed  by  a  write  of  the 
block.  With  probability  Pfull\,  no  vacant  slot  can  be  found,  and  the  directory 
doubles  in  size  (see  insertion  algorithm  in  [FNPS79]).  The  estimated  cost  of 
inserting  a  record  into  a  dyneimic  hash  based  file  is: 

=  ITC  A  {\-Pfull^)yA 

+PfullQy{2xA  {l-PfulLi)yA  +  Pfullix{3-xAycZ i))  (2.9) 


Collecting  equations  (2.3),  (2.5).  (2.6),  (2.8),  and  (2.9),  the  estimated  cost  of 
a  record  insertion  is: 


INS{F) 


‘^hash^aggregcUe  (^) 
/A  Sy^TiQirdered  ) 
^■^^indez^  seq  (F) 

INSs^trceiF) 
INSayna_hashi^  ) 


Ck  =  lhash,Log.val],  Splii-0,  AscGnd'=-0 
Ck=rel.loc,  SpLit=0,  Ascend=0 
Ck-log.val,  Splii=0,  Ascend=1 
Ck=log.val,  Splits  1 ,  Ascend=1 
Ck=hash,  SpLit=1 ,  Asc5Tid=0 
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The  output  of  ein  INSERT  is  the  record  that  was  inserted: 


nJNS 


1 


2.3.4  DELETE 


A  DELETE  operation  involves  a  search  for  query  quedified  records,  followed 
by  the  deletion  of  these  records.  Assxime  all  records  have  an  equal  probability 
of  being  deleted.  Consider  the  case  where  a  simple  file  F  uses  overfiow  to 
accommodate  file  growth  (ie.,  Split=0).  The  cost  of  locating  the  records  to  be 
deleted  is  R£T{F,  Q  ). 

In  the  following  analyses,  we  assume  that  a  file  structure  is  updated 
immediately  after  a  record  is  deleted.  *  With  probability  HQX(HQ+GQ)r  the 
record  is  deleted  from  a  primary  block,  followed  by  a  write  of  the  primary  block. 
With  probability  PouQy/iHo+Co),  the  record  to  be  deleted  is  the  first  record  on  a 
node’s  overflow  chain.  To  update  the  file  requires  a  primary  block  write  (to 
update  the  overflow  chain  pointer)  and  an  overflow  block  write  (to  physically 
delete  the  record).  With  probability  {Cq-Pouq)/'{Ho+Go),  the  record  to  be 
deleted  will  be  somewhere  past  the  first  record  on  an  overflow  chain.  Updating 
the  file  requires  two  overflow  block  writes:  one  for  updating  an  overflow  pointer 
and  the  other  for  deleting  the  record.  Since  the  average  number  of  deleted 
records  is  ATx/,  an  estimate  of  the  cost  of  a  deletion  operation  for  a  file  that 
uses  overflow  is: 


^^^averftovoi^ ,  Q)  ~  RET (F,  Q  )  + 


(2. 10) 


+Ao)xPouo  +  ZxAo  x{G  o'-Pouq) 


) 


Consider  the  case  of  dynamic  hash  based  files.  A  simple  deletion  strategy  is 
to  mark  a  record  deleted  and  to  write  out  the  updated  block.  No  modifications 


Note  that  it  may  be  more  efficient  to  delay  an  update  since  it  is  possible  for  a  block, 
which  contzdns  a  number  of  records  to  be  deleted,  to  be  written  more  than  once. 


to  the  structure’s  directory  (ie.,  cluster  index)  occur  during  a  DELETE,  so  it  is 
possible  for  base  file  nodes  to  be  empty.  The  estimated  cost  of  a  DELETE  is 
therefore: 

DELa.^Ho^i'P.^)  =  RET{¥M)  Ny^f^  (2.11) 

For  the  case  of  B+  trees,  it  is  necessary  to  have  in  main  memory  not  only 
the  base  file  node  that  contains  the  record  to  be  deleted,  but  also  all  cluster 
index  nodes  that  are  ancestors  of  the  base  file  node.  This  is  required  since 
the  merging  or  removal  of  nodes  at  one  level  triggers  node  updates  at  higher 
levels. 

The  search  strategies  of  Section  2.2.1  can  be  modified  to  ensure  that  the 
ancestors  of  each  relevant  base  file  node  are  accessed.  As  a  diagrammatic  aid 
to  describing  these  modified  strategies,  Figure  2.10  illustrates  on  separate 
FSATs  a  representative  collection  of  nodes  which  are  accessed  during  a  modified 
scan,  a  modified  partial  scan,  and  a  modified  range  search.  Note  that  a  cluster 
key  search  always  accesses  ancestor  nodes,  so  no  modification  of  this  search 
strategy  is  necessary. 

In  Appendix  I.  expressions  are  developed  to  estimate  the  cost  of  processing 
queries  for  each  of  the  modified  search  strategies.  Let  MRET{T,  Q)  be  the  cost 
of  locating  records  using  these  strategies. 

With  probability  I-Pctuq,  a  record  can  be  deleted  from  a  primary  block 
without  causing  a  node  merge.  With  probability  Pou^yf^Pmer^,  a  record  deletion 
causes  the  merging  of  two  nodes  into  a  single  node.  This  is  accomplished  by 
reading  an  adjacent  base  file  node,  merging  the  records  of  the  nodes,  and  writ¬ 
ing  out  the  merged  node.  This  merging  causes  a  cluster  index  record  to  be 

deleted  on  level  1,  which  in  turn  may  cause  additional  node  mergings.  W'ith 

An  ancestor  of  a  base  file  node  is  a  node  which  is  on  the  path  from  the  root  node  to  the 
base  file  node. 
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fled  Partial  Scan 


2.10d  Modified  Range  Search 


2.10  Simple  File  Modified  Search  Strategies 
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probability  Poiiox(l— Pttxcto),  the  records  of  two  nodes  are  redistributed  so  that 
both  contain  approximately  the  same  number  of  records.  This  action  requires  a 
read  of  an  adjacent  node,  a  record  redistribution,  a  write  of  the  updated  nodes, 
and  an  update  of  a  cluster  index  record  to  one  of  these  nodes. 

Lei  lUi  be  the  cost  of  updating  a  cluster  key  of  a  cluster  Index  record  on 
level  I,  and  let  DCi  be  the  expected  cost  of  a  record  deletion  on  level  h 


lUi  m 


0  if  a=I 


DC, 


0 

li^Pou,)yiA  f  PoUift/^9fiff(BxA^DCi4i) 


An  itilmate  of  the  coat  of  •  DELinT  for  o  tree  it: 


*  M/isr(r.q)-^Mx/xDCs 


(2.12) 


from  equations  (2.10)  •  (2.12),  the  estimated  cost  of  a  DELETE  operation  la: 


l^SL  (F,q) 


nELo^/taw (F.q)  If  Split  =0 

(P<  Q )  If  Split  ■  t  Ck» hath 
DELs4^trt{^.^)  ^  SplU^l  Ckmlogicai  valued 


The  output  of  a  DELETE  are  those  records  that  are  deleted: 


nDirL(F.Q)  «  Sxf 


8.3.B  REMOVE 

A  REMOVE  is  used  to  delete  a  previously  accessed  record.  Assume  all 
records  have  an  equal  probability  of  being  deleted.  To  estimate  removal  costs, 
consider  the  case  where  a  simple  file  uses  overflow  to  accommodate  file  growth, 
or  is  a  dynamic  hash  based  file.  With  probability  Hoj/(Ho-¥Gq),  the  record 
resides  In  a  primary  block,  and  is  removed  by  marking  the  record  deleted  and 


37 


writing  out  the  primary  block.  With  probability  record  resides 

In  overflow.  In  order  to  remove  the  record  from  the  overflow  chain,  it  is  neces¬ 
sary  to  update  the  overflow  pointer  of  the  overflow  record  prior  to  the  removed 
record.  Since  the  prior  overflow  record  is  not  known,  a  DELETE  operation  is 
issued  to  remove  the  designated  record.  This  is  accomplished  by  generating  a 
query  which  specifies  the  cluster  key  of  the  removed  record,  and  suppljrlng  this 
i|u«r]r  to  DfiliETS. 

Now  suppost  the  slmpla  flla  It  a  irt a.  With  probability  a  raoord 

oan  ba  ramoved  by  marking  it  deleted  and  writing  out  the  updated  node.  With 
probability  />oue.  a  raoord' a  removal  aausas  a  node  marge  or  node  balanoa  and 
Itpdatat  to  the  atruotura'a  tiuatar  Indaa.  Unoa  aiioaator  ehitiar  indat  aodaa 
may  not  ba  memory  resident,  a  DELETE  operation  is  issued  to  remove  the  desig¬ 
nated  record. 


The  estimated  cost  of  a  REMOVE  operation  Is: 


nEM(r)  e 


+ /’ouoxD£'L(F,  ONE)  if  Ck^log.vaU  Split^l,  Ascmd-I 
HqkA  ♦  Cox/)EL(F.  ONE)  othrruriau 

H  o^Cq 


where: 


query 

descriptor  f  ef  kf _ Ss _ 

ONE  1/N  1/N  1/Nk  cluster  key  search 


Since  a  REMOVE  has  no  output, 

nREM  a  0 
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S.S.6  UPDATE 


Tb«  purpoie  of  on  UPDATE  is  to  substitute  o  user's  copy  of  record  s  for  the 
record  x  that  currently  exists  in  a  simple  file  F.  Assume  that  all  records  have  an 
equal  probability  of  being  updated^  Also  assume  that  the  block  of  F  which  eon* 
tains  X  is  already  in  main  memory  so  that  an  UPDATE  x  writes  this  block  to 
secondary  storage  once  the  modifications  to  x  hare  been  affected.  Therefore, 
the  cost  of  an  UPDATE  Is  approximately: 

An  UPDATE  has  no  output: 

nUPD  =  0 


Note  that  If  a  user’s  copy  of  x  is  identical  to  i  in  F,  then  a  block  write  need  not  occur. 
Such  an  event  happens  occasionally,  particularly  in  the  context  <rf  LINK  and  UNLINK  opera- 
tiesas  (see  Section  4.2.2). 
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CHAPTERS.  A  MODEL  OF  FHJE  EVOLUTION 


File  evolution  is  caused  by  record  insertions  and  deletions.  Performance 
deterioration,  which  often  accompanies  file  evolution,  occurs  when  operations 
become  progressively  more  expensive  to  execute.  In  the  case  of  hash  based  and 
indexed  sequential  files,  expected  record  retrieval  costs  increase  as  overflow 
chains  become  longer. 

Changes  in  the  values  of  a  file  descriptor  come  as  a  result  of  file  evolution. 
With  descriptors  as  arguments,  the  cost  functions  of  Chapter  2  can  be  used  to 
assess  the  performance  of  a  simple  file.  Knowing  the  history  of  the  values 
assigned  to  a  file  descriptor,  these  cost  functions  can  be  used  to  trace  the  file’s 
performance  evolution.  Future  performance  is  estimated  by  predicting  changes 
in  the  file's  current  descriptor.  A  goed  of  a  model  of  file  evolution  is  to  develop  a 
methodology  for  predicting  such  changes. 

In  this  chapter,  we  present  a  model  of  file  evolution  that  unifies  a  number  of 
formerly  disparate  results.  Evolution  of  hash  based,  indexed  aggregate,  indexed 
sequential,  and  B+  trees  will  be  considered.  To  summarize  the  contributions  of 
the  analyses  in  this  chapter,  the  following  table  indicates  where  our  analyses  are 
new,  and  when  they  reduce  to  previous  results. 
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file  structure 

insertions 

only 

^  of  insertions  = 

#  of  deletions 

§  of  insertions  ^ 

#  of  deletions 

hash  based 

[SeDu76] 

[Van73] 

new 

indexed  aggregate 

new 

new 

new 

indexed  sequential 

new 

new 

new 

B+  trees 

FNaMiTBl 

new 

new 

We  begin  with  a  discussion  of  some  basic  assumptions  and  concepts. 

3.1  Preliminaries 

3.1.1  Basic  Assumptions 

File  performance  models  have  been  based  traditionally  on  uniform  distribu¬ 
tion  assumptions.  Three  of  these  assumptions  concern  the  retrieval,  deletion, 
and  insertion  of  individual  records: 

(Al)  All  records  have  an  equal  probability  of  being  requested. 

(A2)  All  records  have  an  equal  probability  of  being  deleted. 

(A3’)  Records  that  are  inserted  have  an  equal  probability  of  possessing  any 
given  cluster  key. 

Of  these  assumptions,  (A3')  can  be  generalized: 

(A3)  Records  that  are  inserted  have  cluster  keys  which  Eire  randomly 
chosen  from  a  static,  perhaps  nonuniform,  distribution  of  keys. 

The  generalization  cein  be  understood  in  the  follo'wing  way.  It  is  well  known  that 
key  distributions  are  not  lexicographically  uniform.  For  example,  there  is  a 
greater  probability  of  selecting  a  name  beginning  with  the  letter  ‘s’  from  a 
phone  directory  than  a  name  beginning  with  ’z'.  It  is  possible,  however,  to 
rename  each  key  in  the  key  space  uniquely  so  that  the  resulting  distribution  is 
uniform.  In  the  case  of  a  phone  directory,  one  could  replace  the  ith  name  with 

the  value  i.  *  For  the  purposes  of  predicting  file  statistics,  therefore,  it  is 

^  More  formally,  let  x  belong  to  key  space  S  and  F(x)=Prob(key^x).  where  key  E  S.  Let  x’ 
belong  to  key  space  S’  and  U(x')=Prob(key’^x')=x'.  x  is  identified  with  x’  according  to 
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reasonable  to  treat  a  file’s  key  distribution  as  though  it  were  uniform.  Thus, 
assumption  (A3)  infers  (A3'). 

(Al),  (A2),  and  (A3)  are  the  principles  on  which  our  model  of  file  evolution  is 
developed. 

3.1.2  Node  Occupancy  Distributions 

File  evolution  affects  the  values  of  the  file  parameters  of  a  file  descriptor: 
cost  and  design  parameter  values  remain  constant.  Values  of  file  parameters 
are  not  aiilgned  Independently,  but  are  itatiitlcs  of  a  common  distribution, 
here  called  a  node  occupancy  distribution.  Node  occupancy  distributions  take 
the  form  Ti{x,y)  =  number  of  nodes  on  level  i  that  contain  x  records  in  the  pri¬ 
mary  block  and  y  records  in  overflow.  •  From  assumption  (Al).  it  follows  that: 


Zi  =  TjT'i{x,y)  N  -  Yj^x-¥y)To{x,y) 

*.V  XV 

//»  =  J]xxTi{x,y)/Zi  Ci  =  2yx7’^(x,y)/Zi 


xy 


XV 


Gther  file  parameters,  such  as  PfulLt,  PoUi,  and  PmeT^.  have  definitions  that  are 
file  structure  dependent.  Their  definitions  will  be  given  in  Section  3.3.1. 

A  model  of  file  evolution,  therefore,  is  identified  with  the  problem  of 
predicting  node  occupancy  distributions.  In  this  thesis,  we  limit  our  studies  to 
node  occupancy  distributions  of  base  files.  Distributions  for  other  levels  of  a 
simple  file  can  be  investigated  in  analogous  ways. 


F|x)=z’.  Note  that  F  is  bijective  (see  [DST75.  p.226]). 

Note  that  a  node  occupancy  dlsLribuLion  T^^X.y  )  may  be  viewed  as  an  unnormalized  pro¬ 
bability  distribution  whose  normalization  constant  is  Zi- 
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3.2  Analysis 


3.2.1  Hash  Based  Files 

Techniques  of  estimating  node  occupancy  distributions  follow  from  some 
simple  observations.  Recall  that  the  nodes  of  a  base  file  partition  a  data  space 
of  linearly  ordered  cluster  keys.  Each  node  is  identified  with  a  distinet  interval, 
and  all  records  in  that  node  possess  cluster  keys  belonging  to  that  interval.  Also 
recall  that  assumption  (A3)  enables  us  to  treat  cluster  key  distributions  as 
though  they  were  uniform.  Therefore,  by  representing  a  cluster  keyjiata  space 
as  the  real  number  interval  [0,1],  the  length  of  a  node’s  key  interval  equals  the 
probability  of  inserting  a  record  into  that  node. 

Consider  a  hash  based  file.  Each  base  file  node  is  assigned  a  distinct  hash 
key  (ie.,  key  intervals  contain  precisely  one  key).  Since  there  are  Zq  base  file 
nodes,  the  probability  of  inserting  a  record  into  a  node  is  1/Zq.  (In  this  chapter, 
we  will  abbreviate  Zq  with  Z). 

Additional  estimation  techniques  are  best  introduced  by  a  simple  example. 
Suppose  a  hash  based  file  is  initiedly  empty,  and  is  to  undergo  a  sequence  of 
insertions.  Let  the  node  occupancy  distribution  for  this  file  be  B(I,r)  =  number 
of  base  file  nodes  containing  r  records  given  1  records  have  been  inserted.  Two 
identities  are  immediately  apparent: 


Z  =  2  B(I,r) 

r 

I  =  2  rxB{I,r) 


An  empty  file  is  specified  by  the  initial  conditions: 


B(0.r)  = 


Z  if7-=0 
0  otherwise 


To  obtain  values  of  other  B(I,r),  difference  equations  can  be  deduced. 
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Consider  the  value  of  B(I+l,r).  Given  that  the  file  size  is  I,  B(I+l,r)  equals  the 
sum.  of  the  expected  number  of  nodes  containing  r  records  in  which  no  record 
was  inserted,  plus  the  expected  number  of  nodes  containing  r-1  records  in 
which  a  record  was  inserted: 

B{I  +  l.r)  =  B{I,t){\-1/Z) B{Lr-Vi/Z 
index  ranges;  7^0,  r>0 

A  more  convenient  and  diagrammatic  way  of  expressing  such  relationships 
is  in  Ein  expected  flow  graph  (EFG)  ^  (see  Figure  3.1a).  Properties  of  an  EFG  eire: 
1)  each  vertex  (I,r)  represents  the  set  of  all  nodes  containing  r  records  given  I 
records  have  been  inserted.  The  expected  node  population  of  vertex  (I,r)  is 
B(I,r).  2)  An  EFG  describes  the  flow  of  nodes  into  eind  out  of  designated  vertices. 
The  vertices  of  interest  in  Figure  3.1a  are  (1+1,0)  and  (I+l,r).  3)  Arcs  between 
vertices  indicate  possible  transitions  for  nodes.  Inserting  a  record  into  a  node 
with  r-1  records  when  the  file  is  of  size  I  is  represented  by  the  arc  from  (I, r-1)  to 
(I+l,r).  4)  Each  eirc  is  labeled  with  the  expected  probability  that  a  node  will 
make  the  transition  when  an  insertion  occurs.  The  probability  that  a  node  will 
go  from  vertex  (I, r-1)  to  (I+l,r)  is  l/Z,  which  is  the  probability  that  a  record  will 
be  inserted  into  that  node.  5)  Flow  is  conserved:  all  nodes  that  enter  a  vertex 
will  leave  the  vertex.  Appl3ring  "flow  in  =  flow  out"  to  the  designated  vertices  of 
Figure  3.1a  yields: 

5(7  +  1, 0)  =  5(7.0)(1-1/Z) 

S(7  +  l.r)  =  5(7,r)(l-l/'Z) +5(7.r-l)/'Z 

index  ranges:  7^0,  r>0 

^  EFGs  are  related  to  discrete  time  Markov  chedns  ([Klei75]). 
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Hash  Based  File 


Solving  the  above  equations  yields  the  familiar  weighted  binomial  distribu¬ 
tion: 

which  WELS  proposed  by  [SeDu76]  for  calculating  file  statistics  of  static  hash 
based  files. 

In  the  development  of  flow  conservation  equations,  observe  that  no  informa¬ 
tion  is  needed  about  where  nodes  flow  once  they  have  left  a  designated  vertex. 
For  example,  a  node  can  go  from  vertex  (I+l,r)  to  (1+2, r)  or  (I+2,r+l)  when  a 
record  is  inserted,  yet  the  flow  equation  about  (I+l,r)  makes  no  reference  to 
(1+2, r)  and  (1+2, r+1).  Consequently,  the  EFGs  of  Figure  3.1a  can  be  simplified 
(see  Figure  3.1b).  The  simplification  involves  replacing  all  arcs  that  leave  a 
designated  vertex  with  a  single  arc,  and  labelling  the  new  arc  with  the  value  1 
(which  is  the  sum  of  the  labels  on  the  arcs  that  were  removed  ).  The  new  arc 
does  not  terminate  at  a  vertex.  This  indicates  that  information  about  flow  desti¬ 
nation  is  not  needed.  Henceforth,  this  simplified  notation  will  be  used. 

To  generalize  the  emalysis,  suppose  after  inserting  I  records,  D  records  are 
deleted.  We  know,  by  inspection,  that  the  probability  of  deleting  i  records  from 

a  node  containing  r  records  is  (  ^  )(  )/(  j)  )•  It  follows  that  the  number  of 

nodes  with  r  records  given  I  insertions  followed  by  D  deletions  is  S(l,D,r): 

S(l.D.r)  =  f:z(  )(4rr‘(l-7)'-"'‘(  ’T  )(  '~D-i  )/(  D  ) 

t=o  ^  ^ 

=  B(I-D,r)  (3.1) 

Equation  (3.1),  or  Van  der  Pool’s  result  [Van73],  proves  that  the  node  occu¬ 
pancy  distribution  for  hash  based  files  is  binomially  distributed,  regardless  of 
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the  number  and  order  of  record  insertions  and  deletions. 


The  order  of  insertions  aind  deletions,  however,  does  affect  the  number  of 
records  in  overflow,  and  consequently,  it  affects  the  values  of  a  file  descriptor. 
Such  dependencies  are  modeled  by  T(N,x,y)  =  the  number  of  nodes  with  x 
records  in  the  primary  block  and  y  records  in  overflow,  given  that  there  are  N 
records  in  the  file. 

Let  .^0  be  the  initial  number  of  records  in  the  file.  Setting  R,  an  abbrevia¬ 
tion  of  /?o.  to  be  the  record  capacity  of  a  primary  block,  the  initial  node  occu¬ 
pancy  distribution  is: 


B{No.x) 

T(No,x.y)  =  '  B{No.R+y) 

0 


x^R,  y=0 
x-R,  y>0 


(3.2) 


otherwise 


The  EFGs  of  Figures  3.2  and  3.3  refiect  simple  record  insertion  and  deletion  algo¬ 
rithms  for  hash  based  files.  Flow  conservation  yields  the  following  system  of 


equations  for  insertion:  * 


T(N  +  l,x.y)  =  T(N.x.y){l-j)  +  T{N,x-l.y)/Z 


(3.3) 


T{N  +  l.R,y)  =  T{N.R.y)(,l—^)  +  [T(N,R-l.y)  +  T(N.R.y-r)VZ 


index  ranges:  0^x<R,  y^O 


and  for  deletion: 


T(N-l.x,y)  =  T(N.x.y)(l-^^)* 


(3.4) 


(^)T(N.x  +  l.y)  +  {^)T{N.x.y  +  l) 


index  ranges:  OSxSR,  y^O 


^  T(-l.it,y)  =  T(N.-l.y)  =  T(N.x.-l)  =  T(N.R+l,y)  =  0  for  aU  positive  N.  x.  y. 


47 


48 


Record  Insertion  Algorithm  Record  Deletion  Algorithm 


Equations  (3.3)  and  (3.4)  describe  how  a  node  occupancy  distribution  is 
altered  when  a  single  record  insertion  and  deletion  occurs.  Owing  to  their  com¬ 
plexity,  no  explicit  solutions  to  these  equations  are  known.  However,  solutions 
are  derived  in  the  following  sections  by  steirting  with  the  initial  distribution  (3.2) 
and  applying  equations  (3.3)  and  (3.4)  according  to  the  sequence  of  insertions 
and  deletions  that  modifies  the  file.  In  this  way,  the  evolution  of  the  file’s  node 
occupancy  distribution  is  modeled.  As  an  aid  to  describing  insertion  and  dele¬ 
tion  sequences,  some  additional  notation  is  introduced  in  Section  3.3.1. 

3.2.2  Indexed  Aggregate  and  Indexed  Sequential  Files 

Indexed  sequential  files  maintain  a  key  ordering  of  records  within  individual 
nodes;  indexed  aggregate  files  do  not.  This  seemingly  minor  difference  is  actu- 
sdly  quite  important.  In  network  databases  where  records  are  connected  via 
pointers,  it  is  imperative  that  the  physical  address  of  a  record  remain  constant. 
Moving  a  record  requires  updating  all  pointers  to  the  record,  an  operation  which 
is,  in  general,  impractical.  For  this  reason,  indexed  aggregate  files  can  enjoy  a 
greater  utility  in  network  databases  than  that  of  indexed  sequential  files. 

Indexed  aggregate  and  indexed  sequential  files  partition  a  data  space  in  a 
simileu*  manner.  When  a  base  file  is  created,  the  records  of  the  file  are  sorted 
on  ascending  cluster  keys  and  stored  so  that  each  node  contains  an  equal 
number  of  records.  The  highest  valued  cluster  key  in  each  node  is  identified 
with  a  distinct  point,  cedled  an  anchor  point,  in  the  data  space  of  the  file 
Anchor  points  partition  the  data  space  by  serving  as  endpoints  of  key  intervals. 
Unlike  key  interveds  of  hash  based  files,  these  intervals  are  of  nonuniform 
length. 

®  Ihe  anchor  p>olnt  of  the  last  base  file  node  corresponds  to  the  highest  valued  cluster  key  in 
the  data  space,  rather  them  the  highest  valued  key  in  the  node. 
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Indexed  aggregate  and  indexed  sequential  files  shaire  the  same  node  occu¬ 
pancy  distribution  when  no  records  are  deleted.  Let  P(I,r)  =  number  of  base  file 
nodes  containing  r  records  given  that  a  total  of  I  records  have  been  inserted. 
Suppose  there  are  Z  base  file  nodes,  each  initially  containing  s  records.  The  ini¬ 
tial  file  size  is  Nq  =  sxZ.  Representing  the  cluster  key  data  space  as  the  interval 
[0,1]  eind  using  assumption  (A3),  the  probability  density  function  f(w)  of  a  key 
interval  having  length  w  is:  ® 

f{w)  =  s(  ^ 


s(  ^  )  is  the  normalization  constant. 

's' 

After  the  insertion  of  I  records,  the  probability  b(I,r,w)  that  a  node  with 
interveil  length  w  will  contain  an  additional  r  records  is  binomially  distributed: 


b(I.r,w)  = 

P(l,r)  is  given  by:  ^ 

P(I,r)  Z  b{I,r~s,w)f  {w)dw 
^(ATc+^Z-l  J 

As  s  (the  initial  number  of  records  per  base  file  node)  increases,  the  nonuni¬ 
formities  in  cluster  key  interval  lengths  are  reduced.  As  a  consequence,  P(I,r) 


®  Proof,  s-l  cluster  keys  fall  within  the  interval  [O.w)  with  probability  1  key  belongs 

to  the  iij^ervalj[w,w+dw]  with  probabilily  dw,  A^o— s  —  1  keys  are  in  (w+dw.l)  with  probability 
(l— lu  )  °  .  and  1  key  is  at  [1.1]  with  probability  1  (see  footnote  5).  The  probability  of  all 

events  occuring  simultaneously  is  proportional  to  the  product  of  their  individual  probabili¬ 
ties.  Integrating  this  product  from  0  to  w,  and  normalizing  over  the  Interval  0  to  1,  yields  the 
probability  distribution  function  F(w); 

F{w)  =  ^ 

IVom  which  f(w).  the  probability  density  function,  is  obtedned  easily. 

'  A  similar  expression  was  derived  in  [Kelja74]  for  estimating  statistics  of  B+  trees. 
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will  closely  approximate  the  weighted  binomial  distribution  defined  in 

the  previous  section  (Fig.  3.4a).  For  small  s,  however,  the  similarities  are 
weaker.  Even  though  the  size  of  the  file  was  doubled  in  Figure  3.4b,  25%  of  the 
nodes  did  not  experience  a  record  insertion.  This  is  due  to  the  nonuniformity  of 
cluster  key  interval  lengths:  those  nodes  with  long  intervads  wiU  absorb  the 
majority  of  record  insertions.  Or,  in  other  words,  long  overflow  chains  grow  fas¬ 
ter  than  shorter  overflow  chains.  ® 

The  latter  observation  may  be  explained  by  assumption  {A3).  On  the  aver¬ 
age,  cluster  keys  of  a  file  are  evenly  spaced  over  a  data  space.  It  follows  that  the 
probability  of  inserting  a  record  into  a  node  is  proportional  to  the  number  of 
records  in  the  node.  Specifically,  if  there  are  r  records  in  a  node  and  N  records 
in  the  file,  the  probability  of  inserting  a  record  into  the  node  is  r/N. 

P(I,r)  may  be  derived  from  EFGs  in  the  following  manner.  The  conditions  at 
file  creation  are: 


P(0,r) 


Z  ifr^s 
0  otherwise 


Figure  3.5  shows  the  EFGs  that  follow  from  the  above  hypothesis  on  insertion 
probabilities.  Flow  conservation  jdelds  the  identities: 


P(I+l,s)  =  p(].s)(l--;^) 

P(Rl.r)  =  P(,,r)(l  -  +  Par-1)(^) 

Before  we  extend  our  analysis  to  overflow  distributions,  it  is  instructive  to 
consider  the  case  where  I  insertions  are  followed  by  D  deletions.  If  a  node  con¬ 
tains  r  records,  the  probability  of  deleting  i  of  the  r  records  is 

p 

Note  that  this  contrasts  with  hash  based  files  where  all  overflow  chains  grow  at  the  same 
rate. 
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Figure  3.4  P(I,r)  v.s.  B(No+I,r) 


Figure  3.5  EFG  of  an  Indexed  Aggregate  and 
Indexed  Sequential  File 
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^  ^  follows  that  the  number  of  nodes  with  r  records 

given  I  insertions  followed  by  D  deletions  is  E(I,D,r): 


E(I,D.r)  = 


D 

s 

t=0 


No( 


No-1 

s 


)(r4-s)  ( 


r +1 
i 


..  No+I-T-i 
D-i 


) 


(  No+I-1  s 

(r+tK  r+i  ) 


/  \ 
\  D  } 


Unfortunately,  E(I,D,r)  cannot  be  simplified  so  that  the  dependence  on  I  and  D  is 
removed.  (It  would  have  been  nice  had  E  depended  only  on  the  number  of 
records  in  the  file.  Nq+I—D,  as  in  the  case  of  hash  based  files.)  The  fact  that 
node  occupancy  distributions  are  influenced  by  record  deletions  is  not  surpris¬ 
ing.  Consider  node  B  of  Figure  2.1  (page  5)  where  a  vacant  record  slot  exists  in 
the  primary  block  and  the  overflow  chain  contains  two  records.  This  situation 
could  not  happen  if  only  records  were  inserted.  This  is  evident  since  the  pri¬ 
mary  block  of  a  node  must  be  full  before  any  records  are  placed  in  overflow. 
Thus,  the  state  of  a  file  reflects  information  about  deleted  records. 

In  view  of  the  above,  our  hypothesis  on  insertion  probabilities  must  be  gen¬ 
eralized:  the  probability  of  inserting  a  record  into  a  node  is  proportional  to  the 
number  of  records  that  were  stored  in  the  node,  including  those  which  have 
been  deleted. 

Consider  an  indexed  aggregate  file.  Let  T(N,D,x,y,d)  be  the  number  of  base 
file  nodes  which  contain  x  records  in  the  primary  block,  y  records  in  overflow, 
and  from  which  d  records  were  deleted,  given  N  records  are  in  the  file  emd  a 
total  of  D  records  have  been  deleted.  Let  R  be  the  record  capacity  of  a  primary 
block.  The  conditions  at  file  creation  are: 


Z  ifx=s.  y-0,  d-0 


T{No,  O.x.y.d) 


0  otherwise 


(3.5) 


The  EFGs  of  Figures  3.6  and  3.7  reflect  simple  record  insertion  and  deletion  algo¬ 
rithms  for  indexed  aggregate  files.  (Note  that  the  EFGs  follow  from  the  general¬ 
ized  insertion  probability  hypothesis.)  Flow  conservation  yields  the  following 
system  of  equations  for  insertion:  ® 


T{N  +  l.D.x,y,d) 

T{N^l.D,R.y.d) 


T{N,D.x.y,d){\ 


r+y-Kf  X 
N-^D  ^ 


T{N,D.x-l,y,d 


war-l-Hy-Hdv 

N+D  ^ 


T{N,D,R,y,d){l 


R  +y  -t-d  V 
N+D  ^ 


+  [T{N,D.R-l.y.d)  +  T(N,D.R.y-l.d)]{^-^^^) 


(3.6) 


index  ranges:  0^x</2.  y^O,  O^d^D 


and  for  deletion: 


T(N~l.D  +  l.x,y.d) 


=  T{N.D,x.y.d)(.l  -  ■^)  +  T(N.D.x+l.y.d-l){^^) 

Jy  rJ 

+  T(N.D.x.y  +  l,d-l)(^i^)  (3.7) 

IW 

index  ranges:  O^x^R,  ykO. 


Now  consider  an  indexed  sequential  file.  The  cluster  key  of  the  first  record 
on  a  node’s  overflow  chain  is  the  overflow  key  of  the  node.  A  record  is  inserted 
into  an  overflow  chain  if  its  cluster  key  is  greater  than  the  overflow  key.  Let  x 
be  the  number  of  records  in  the  primary  block,  y  be  the  number  of  records  in 
overflow,  and  d  be  the  number  of  records  deleted  from  the  node  in  which  the 

record  is  to  be  inserted.  With  probability  — the  last  record  in  the 

x+y+d 

overflow  chain  has  the  cluster  key  that  terminates  the  key  interval  of  the  node. 
We  know  R-x  deleted  records  have  keys  less  than  the  overflow  key;  the  remaining 
d-R+x  deleted  records  cein  have  any  key.  On  the  average,  the  cluster  keys  of  the 


T 

X, 


T(-l  JD.x,y.d)=T(N.D.-l.y.d)=TCN.D.x.-l.d)=T(N.D.i.y.-l)=T(N.D.R+l.y.d)=0  for  aU  positive  N. 

y.  d 
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Figure  3.6  EFGs  of  an  Indexed  Aggregate  File  Figure  3.7  EFG  of  an  Indexed  Aggregate  and 

Record  Insertion  Algorithm  Indexed  Sequential  File 


x+X'H'd  rocords  will  b®  arenlf  spaced  over  the  node’s  key  interval.  Provided  y21. 

it  follows  that  is  the  probability  that  the  cluster  key  of  a  new  record  is 

R+y 

greater  than  the  overflow  key. 

With  firobilbUiijr  rtbord  wii$i  iikt  Inisml  UrmlttAUng  olyiitr 

liey  bai  hteii  dtlt ted.  deleted  reoords  hm  keys  lesi  then  the  everflov  ker» 
\  deleted  record  bee  e  key  that  is  greeter.  Accordingly,  the  probebility  that  the 

eluster  key  of  e  new  record  Is  greater  than  the  overflow  key  is 

a  weighted  sum  of  probabilities  and  rearranging  terms,  we  have  the  expected 
probabilities  p  and  p  that  the  cluster  key  of  a  new  record  is  less  than  and 
greater  than  the  overflow  key: 


p{x,y.d) 


1 

^a;+y+d  /?+v  +  l^ 

p{x,y,d)  -  l"p{x,y.d) 


Figure  3.8  shows  EFGs  of  a  simple  record  insertion  algorithm  for  indexed 
sequential  files.  Flow  conservation  yields  (see  footnote  ^): 


T(N->-l.D.x.y.d)  =  T(KD.x.y.d){l-^^^)  + 

[T(N,D,x-l,y,d)p(x-l,y,d)  +  T{N,D,x,y-l,d)p(x,y-l,d)]{^*^'^^  ^  ) 

(3.8) 

T(N-¥\,D.R,y.d)  =  T{N.D.R,y.d){l- + 

[T(N,D.R.y-l.d)  +  T{N.D.R-l.y.d)p{R-\.y,d)](^  '> 


index  ranges:  0^x</2,  i/^0. 


Deletion  equations  for  indexed  sequential  files  are  given  by  (3.7)  since  both 
indexed  aggregate  and  indexed  sequential  files  use  the  same  deletion  adgorithm. 
Validation  and  applications  of  (3.6)-(3.8)  are  considered  in  Section  3.3. 
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Figure  3.8  EFGs  of  an  Indexed  Sequential  File 

Record  Insertion  Algorithm 


58 


3.2.3  B+  Trees 


B+  trees  use  node  splitting,  rather  than  overflow,  to  accommodate  flle 
growth.  As  a  consequence,  the  number  of  nodes  in  a  B+  tree  varies  with  time. 
So  too  does  the  number  of  key  intervals  in  a  B+  tree  data  space  partitioning. 


Since  B+  trees  may  be  created  in  a  manner  identical  to  that  of  indexed 
sequential  files,  we  know  that  key  intervals  are  of  nonuniform  length.  Also  from 
the  previous  section,  we  know  that  the  probability  of  inserting  a  record  into  a 
node  is  proportional  to  the  number  of  records  in  the  node. 

Let  Q(l,r)  be  the  number  of  base  file  nodes  containing  r  records  given  that  a 
total  of  I  records  have  bpen  inserted.  Initially,  the  file  size  is  .A^o  and  each  base 
file  node  contains  s  records.  Therefore,  at  file  creation  time: 


Q(O.r) 


Nq/s  if  -p—s 

0  otherwise 


Because  overflow  is  not  used,  the  value  of  s  is  constrained  to  be  between  the 
minimum  and  maximum  number  of  records  per  node:  Mr^s^R.  From  the 
insertion  probability  hypothesis.  Figure  3.9  shows  the  EFGs  of  a  B+  tree  that 
does  not  experience  record  deletions.  Flow  conservation  yields: 


ATo+Z 


ATo+Z 


(3.9) 


Q(Z  +  l.r)  =  Q(Z.r)(l- 


ATo+Z 


ATo+Z 
)  +  <?(Z.r-l)( 


A^o+Z 


A^o+Z 


r-1 

Nq+I 


) 


index  ranges:  Afr  +  Kr^i?,  /^O 


Note  that  these  equations  are  valid  when  the  record  capacity  R  of  a  primary 


*®For  B+  trees,  Mr  = 


Z?  + 1 
2 
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Figure  3.9  EFGs  of  a  B+  Tree  for  Even  R  Figure  3.10  EFGs  of  a  B+  Tree  for  Odd  R 


block  is  even.  For  odd  R,  a  node  splits  into  two  nodes  with  identical  record 
populations.  The  corresponding  equations  are  (see  Figure  3.10); 


Q{I  +  l,Mr) 

Q(/  +  l.r) 


No+J 


No+f 


No+I 


No+I 


index  ranges:  Mr<r^R,  7^0 


(3.9’) 


Equations  (3.9)  and  (3.9’)  were  first  derived  by  Nakamura  and  Mizoguchi 
[NaMi78]. 

Incremental  changes  in  node  occupancy  distributions  caused  by  record 
insertions  and  deletions  are  described  by  very  complex  equations.  Unfor¬ 
tunately,  the  EFGs  of  these  equations  are  not  simple  enough  to  aid  in  under¬ 
standing  equation  development.  An  alternative  method  is  to  estimate  flows  aris¬ 
ing  from  all  circumstances.  Knowledge  of  aU  flows  is  equivedent  to  knowing  flow 
conservation  equations. 

As  in  the  modeling  of  indexed  sequential  files,  knowledge  of  deleted  record 
populations  within  nodes  is  needed  to  estimate  accurately  expected  flows.  Let 
T(N,D,x,d)  be  the  number  of  base  file  nodes  containing  x  records  and  from  which 
d  records  have  been  deleted,  given  that  there  are  N  records  in  the  file  and  a 
toted  of  D  records  have  been  deleted.  The  conditions  at  file  creation  time  £U*e: 


T{No.O,x,d)  = 


Z  if  T=s,  d-0 
0  otherujise 


(3.10) 


Consider  record  insertion.  There  are  two  sources  of  flow  contributing  to 
any  vertex  of  a  B+  tree  EFG.  One  flow  originates  from  node  splitting;  the  other 
does  not.  The  contributions  not  from  node  splitting  are  displayed  in  Figure 
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3.1  la.  Flow  accounting  yields:  ” 


T{N  +  l.D,x.d)  ^  T{N.D.x.d){l-^;^)-^T{N.D,x-l.d){^^^^) 


N+D 


N-^D 


(3.11a) 


index  ranges:  Mr^x^R,  O^dSD 


(Note  that  assignment  rather  than  equality  is  appropriate  because  other 
terms  are  appended  later.) 

Now  consider  node  splitting.  Let  (x.d)  be  the  undeleted  and  deleted  record 
population  of  a  node.  Before  a  node  splits,  the  node’s  population  is  character¬ 
ized  by  (R+l,d).  When  the  split  occurs,  two  nodes  with  populations  (Mr.i)  and 
{Mr,d-i)  are  created,  where  A/r^R+l-Mr.  Let  the  probability  that  i  assumes  the 
vedue  j  be  h(j,d).  The  flow  contributions  from  node  splitting  are  shown  in  Figure 
3.11b.  Flow  accounting  5nelds: 

T(N  +  l.D.MT.j)  <-  T(N  +  l.D.Mr.j)  + h.(J.d)(-^^)T(N,D,R.d) 

iV  +JJ 

(3.11b) 

T(N+l,DM.d-j)  ^  T(,N  +  l,D,i^,d-j)  +  h(j,d)(-^^)T(N.D.R.d) 

index  ranges:  O^d^D,  O^j^d 

Evaluating  assignments  (3.11)  over  the  indicated  index  ranges  determines  the 
values  of  T(N+l,D,x,d). 

What  remains  to  be  determined  is  an  expression  for  h(j,d).  As  in  indexed 

sequential  files,  each  base  file  node  has  a  record  whose  cluster  key  terminates 

% 

the  key  interval  of  the  node.  Suppose  this  record,  here  called  an  anchor  record, 
has  not  been  deleted.  Assuming  that  the  remaining  R+d  cluster  keys  of  an 
overflowed  node  are  evenly  spaced  over  the  node's  key  interval,  h(j,d)  is  given 
by: 

*'T(-l.D.x.d)=T(NJ).x.-l)=T(N.D.Mr-l.d)=T(N,D.R+l.d)=:0  for  all  poslUve  N.  D.  x  d. 
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3.11a.  Flow  Contributions  not  from  Node  Splitting 


3.11b.  Flow  Contributions  from  Node  Splitting 


Figure  3.11  EFGs  of  a  Q+  Tree  Record  Insertion  Algorithm 


h{j.d) 


(3.12) 


/  Mr  +j  —  1  \  /  /?  +  d  —Mr  —j  \ 

^  j  d-j  ^ 

f  R  +d  \ 
d  ^ 

Since  the  anchor  record  of  a  node  may  be  deleted,  equation  (3.12)  only  approxi¬ 
mates  h(j,d). 

Now  consider  record  deletion.  There  are  three  sources  of  flow  contributing 
to  any  vertex  of  a  B+  tree  EFG.  One  flow  originates  from  node  merging;  a  second 
from  node  balancing;  and  a  third  not  from  merging  or  balancing.  The  contribu¬ 
tions  not  from  node  merging  or  balancing  arc  displayed  in  Figure  3.12a.  Flow 
aeeounting  yields  (see  footnote  ll): 

T{N-\,D^l,x.d)  *-  T{N,D,x,d){l-^)  +  T{N.D.x-^\.d-\){^^)  (3.13a) 

iV  iv 

index  ranges:  O^d^D,  Mr^x^R 

Node  merging  oeeurs  when  an  underflowed  node  with  record  population 
(Mr-1, k+1)  is  combined  with  another  node  (x,d)  to  form  a  single  node  (Mr- 
l+x,d+k+l).  Because  a  maximum  of  R  records  can  be  stored  in  a  node,  the 
value  of  X  is  restricted  to  the  values  of  Mr  and  Mr.  The  probability  of  selecting  a 
node  with  record  population  (x,d)  is  very  nearly  T(N,D,x,d)/Z.  Figure  3.12b 
shows  the  flow  contributions  of  node  merging,  from  which: 

7’(//-l,ZI  +  l.A/r-l+x.d+fc  +  l)  ^  T{N-\,D  +  \,.Mr-l^x,d+k  +  i) 

+  (^)T (N.D.Mr.kK  )  (3.13b) 

T{N-l,D  +  l,x,d)  <-  T{N-l.D  +  l,x,d)-{^)T{N,D.i{r.k)(  ) 

index  ranges:  O^d^D, 

The  negative  flow  in  Figure  3.12b  simply  indicates  that  every  time  a  node  (Mr- 

For  a  given  N  and  D,  Z  =  ^  T(N,D,x.d). 

x,d 
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3.12a  Flow  Contributions  not  from  Merging  or  Balancing 


3.12c  Flow  Contributions  from  Node  Balancing 


Figure  3.12  EFGs  of  a  Tree  Record  Deletion  Algorithm 
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l+x,d.+k+l)  is  created  by  a  merge,  a  node  (x.d)  is  consumed  in  the  process. 

Node  bedaLncing  occurs  when  records  of  an  underflowed  node  with  record 
population  (Mr-1, k+l)  are  combined  with  one  or  more  records  of  a  node  (x,d)  to 
form  two  nodes  (bal,i+k-M)  and  (6ai,d-i)  that  contain  approximately  equal  non- 
deleted  record  populations,  bal  and  bal  are  taken  as: 

bal  = 

,  bal  — 


Mr—l+x 

2 

Mr-l+x 

2 


and  x>Mr. 

In  the  process  of  moving  bal-Mr+1  undeleted  records  from  (x,d)  to  (Mr- 
1, k+l),  j  deleted  records  are  also  ’trsinsf erred’.  These  j  records  have  cluster 
keys  that  are  less  than  the  key  of  the  (bal-Mr+l)th  undeleted  record.  In  effect, 
the  key  interval  of  the  underflowed  node  is  being  enlarged  by  the  addition  of 
bal-Mr+l+j  keys,  with  the  key  of  the  (bal-Mr+l)th  undeleted  record  terminating 
the  extended  intervaL  At  the  same  time,  the  key  interval  of  node  (x,d)  is  being 
shortened  correspondingly. 

Let  the  probability  that  j  deleted  records  aire  ’transferred’  be  g(j,x,d), 
where  O^j^d. .  The  flow  contributions  from  node  bcdancing  are  shown  in  Figure 
3.12c,  from  which:  - 


T{N-l.n+l.bal,j+k  +  l)  *-  T{N-1.D +  l.bal.j  +  k  +  l) 


T(N-l,D  +  l,bal.d-i)  «-  T(N -l.D +  l,bal,d-j) 


+  (^)T(N.D.Mr.k)(  )g(j.x.d) 


T{N-\,D  +  l,x,d)  <-  T{N-tB  +  l.x,d) 


-  (^) T (N.D.Mr.  fe ^  ^  )g  (J. I.  d ) 


(3.13c) 


index  remges:  Mr<x^R,  O^d^D,  O^k^D,  O^j^d 
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Evaluating  assignments  (3.13)  over  the  mdicated  index  reinges  determines  the 
values  of  T(N-l,D+l,x,d). 

An  expression  for  g(j,x,d)  remains  to  be  determined.  Suppose  the  anchor 
record  of  node  (x,d)  has  not  been  deleted.  Assuming  that  the  remaining  x+d-1 
records  have  cluster  keys  that  are  evenly  spaced  over  the  key  intervsd  of  the 
node,  g(j,x,d)  is  given  by: 

^  bal-Mr+j  x  +  d-2-bal+Mr-j  ^ 

g{j.x,d)  =  - ^ -  -^—1 - -  (3.14) 

/  z  +ct  —  1  \ 

^  d  ^ 

Again,  since  the  anchor  record  of  node  (x,d)  may  be  deleted,  equation  (3.14) 
only  approximates  g(j,x.d).  Validation  and  applications  of  equations  (3.11)  and 
(3.13)  are  considered  in  the  following  sections. 

3.3  Synthesis 

3.3.1  File  Events 

The  general  form  of  a  node  occupancy  distribution  in  the  previous  sections 
is  T(N,D,x.y.d)  =  number  of  base  file  nodes  contedning  x  records  in  the  primary 
block,  y  records  in  overfiow,  and  from  which  d  records  were  deleted,  given  N 
records  in  the  file  and  a  total  of  D  records  have  been  deleted.  Because  of  the 
identities: 

N  =  {x+y)T{N.D.x,y.d) 

z,y,d 

D  =  5j  d'xT{N,D.x.y,d) 

T,y,d 

N  and  D  need  not  be  an  explicit  parameter  of  T.  Henceforth,  wc  will  abbreviate 
T(N,D,x,y,d)  with  T(x,y,d). 

For  B4-  trees.  T(x,y,d)=0  for  all  1.  For  hash  based  files,  T(x,y.d)=0  for  all  d^  1. 
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Some  additional  notation  and  definitions  are  needed  to  describe  sequences 
of  insertions  and  deletions.  Let  difile  event  be  an  insertion  or  deletion,  and  let 
be  a  vector  containing  the  values  of  T(x,y.d)  for  O^x^R,  y^O,  d^O  after  file 
event  i.  denotes  the  node  occupancy  distribution  at  file  creation  (eg.,  distri¬ 
butions  (3.2).  (3.5),  (3.10)).  Each  set  of  iteration  equations  that  was  developed 
in  previous  sections  can  now  be  expressed  as  a  matrix  equation.  For  record 
insertion  (ie.,  eqn.  (3.3),  (3.6),  (3.0),  (3.10)),  we  may  write: 

i;.,,  =  tiAi  (3.15) 

and  for  record  deletion  (ie.,  (3.4),  (3.7),  (3.13)); 

=  ti^  (3.18) 

where  Ai  and  Ai  are  matrices  whose  values  may  be  functions  of  the  number  of 
records  in  the  file  after  event  i  and  the  total  number  of  records  deleted  after 
event  i. 

By  selectively  multiplying  Ai  and  A<,  it  is  possible  to  model  a  variety  of 
interesting  situations.  For  example,  a  file  is  in  steady  state  if  the  file  experi¬ 
ences  an  equal  number  of  record  insertions  2Lnd  deletions.  Van  der  Pool 
([Van73])  modeled  steady  state  files  by  having  insertion  and  deletions  occur 
simultaneously: 

lim  (3.17) 

i-^OD 

where  (3.15)  and  (3.16)  correspond  to  (3.3)  and  (3.4).  The  approximation  of 
steady  state  files  that  we  adopt  in  this  thesis  is: 

ft+2  “  (3.16) 

Note  that  the  matrix  notation  of  this  section  is  Introduced  only  for  expository  purposes; 
matrix  multiplications  were  not  used  in  actual  cadculations.  Instead,  iterations  over  inser¬ 
tion  and  deletion  equations  according  to  the  file  event  sequences  of  (3.15).  (3.10).  and  (3.19) 
were  used. 
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As  other  examples,  an  equation  that  models  the  situation  where  insertions 
outnumber  deletions  by  two  to  one  is: 

U+S  -  ^Ai^i+l^+2  (3.19) 

euid  (3.15)  models  files  that  experience  no  deletions. 

In  general,  the  evolution  of  a  file’s  node  occupancy  distribution  is  modeled 
by  1)  composing  an  iteration  equation  which  approximates  the  expected 
sequence  of  record  insertions  and  deletions  the  file  is  to  experience,  and  2) 
using  the  equation  to  obtain  successive  starting  with  to. 

As  mentioned  previously,  many  file  descriptor  parameters  are  statistics  of 
node  occupancy  distributions.  Table  3.1  lists  those  definitions  that  are  common 
to  all  files.  Tables  3.2-3. 4  list  definitions  that  aire  particular  to  hash  based, 
indexed  aggregate  and  indexed  sequential,  and  B+  tree  files.  Statistics  for  levels 
other  than  the  base  file  may  be  obtained  by  setting  T(x,y,d)  to  be  the  node  occu- 
pemcy  distribution  of  the  level  of  concern  and  applying  the  definitions  listed  in 
the  appropriate  tables. 
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Table  3. 1  Statistic  Definitions  Common  to  All  Files 


Table  3.2.  Statistic  Definitions  Particular  to  Hash  Based  Files 
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Pou  =  Y,  Y'!'(.^.y.d)/z 

y>l  £,d 

Pfvii  =  Y(.^t'iV^)T{R.y.<i) 

v.d  ^ 

Pmer  =  0 


Table  3.3.  Statistic  Definitions  Particular  to  Indexed  Aggregate 

and  Indexed  Sequential  Files 


Pou 

PfuLl 

M? 

Pmer 

=  S  £7'(i,0,<i)/Z 

x=Mr  d 

Table  3.4  Statistic  Definitions  Particular  to  B+  Trees 

3.3J2  VaHdation 

Extensive  simulation  studies  were  conducted  in  order  to  validate  the  equa¬ 
tions  developed  in  Sections  3.3. 1-3.3.3.  Tables  3.5-3.8  show  results  of  four  such 
studies.  A  simulation  experiment  involved  the  collection  of  file  descriptor  values 
of  a  steady  state  file  at  uniform  intervals  in  time.  After  150  experiments,  the 
observed  values  were  averaged  and  compared  with  their  theoretical  counter¬ 
parts.  For  the  case  of  hash  based,  indexed  aggregate,  and  indexed  sequentisd 
files,  in  at  least  96%  of  the  compaLrisons,  the  theoretical  values  were  within  the 
.95  level  confidence  intervals  of  the  observed  values.  Other  simulation  studies 


In  fant,  simulation  studies  were  done  prior  to  equation  development.  For  the  caise  of  in¬ 
dexed  sequential  and  B+  tree  files,  many  sets  of  equations  were  discarded  before  a  set  was 
found  which  explained  and  predicted  those  values  observed  in  simulation  studies.  Simula¬ 
tion,  therefore,  played  an  essenti2d  role  in  developing  and  validating  the  equations  of  Sec¬ 
tions  3.3. 1-3. 3.3. 

1  A 

In  both  simulation  and  theory,  steady  state  files  were  modeled  by  zn  alternating  sequence 
of  insertions  £ind  deletions  (eqn.  (3.18)). 
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row 

theory 

H 

sim. 

H±.07 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

PfuU±.01 

theory 

n 

sim. 

n±.02 

0 

27.89 

27.85 

.45 

.45 

.53 

.53 

.28 

.29 

1 

27.57 

27.53 

.50 

.50 

.30 

.31 

.33 

.35 

2 

27.39 

27.35 

.53 

.53 

.24 

.24 

.36 

.37 

3 

27.25 

27.21 

.56 

.55 

.21 

.21 

.38 

.40 

4 

27.14 

27.11 

.58 

.57 

.20 

.20 

.39 

.41 

5 

27.04 

27.02 

.60 

.58 

.19 

.18 

.41 

.42 

6 

26.96 

26.95 

.61 

.60 

.18 

.19 

.42 

.44 

7 

26.89 

26.85 

.62 

.62 

.17 

.17 

.43 

.45 

8 

26.83 

26.78 

.64 

.63 

.17 

.17 

.44 

.45 

Table  3.5.  A  Validation  Study  of  Steady  State  Hash  Based  Files 


row 

theory 

H 

sim. 

H±.02 

theory 

Pou 

sim. 

Poui.Ol 

theory 

Pfull 

sim. 

Pfull±.01 

theory 

n 

sim. 

n±.03 

0 

3.00 

3.00 

.00 

.00 

.00 

.00 

.00 

.00 

1 

2.94 

2.93 

.05 

.05 

.14 

.15 

.03 

.03 

2 

2.81 

2.82 

.12 

.12 

.18 

.18 

.10 

.09 

3 

2.71 

2.72 

.17 

.16 

.19 

.19 

.17 

.16 

4 

2.63 

2.64 

.20 

.19 

.19 

.18 

.23 

.22 

5 

2.58 

2.59 

.22 

.21 

.19 

.18 

.27 

.26 

6 

2.54 

2.56 

.23 

.23 

.19 

.20 

.30 

.28 

7 

2.51 

2.52 

.24 

.24 

.19 

.18 

.33 

.32 

6 

2.49 

2.50 

.25 

.24 

.19 

.19 

.34 

.33 

Table  3.6.  A  Validation  Study  of  Steady  State  Indexed  Aggregate  Files 


row 

theory 

H 

sim. 

H±.03 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

PfuUi.Ol 

theory 

n 

sim. 

n±.04 

0 

3.00 

3.00 

.00 

.00 

.00 

.00 

.00 

.00 

1 

2.94 

2.93 

.05 

.05 

.14 

.15 

.03 

.03 

2 

2.81 

2.B1 

.12 

.11 

.10 

.17 

.10 

.10 

3 

2.70 

2.70 

.17 

.18 

.18 

.17 

.18 

.10 

4 

2.61 

2.60 

.20 

.20 

.17 

.16 

.25 

.26 

5 

2.54 

2.55 

.22 

.21 

.17 

.16 

.31 

.31 

6 

2.49 

2.49 

.24 

.23 

.16 

.16 

.36 

.37 

7 

2.45 

2.43 

.25 

.25 

.15 

.15 

.40 

.42 

8 

2.42 

2.41 

.26 

.26 

.15 

.16 

.43 

.44 

Table  3.7.  A  Validation  Study  of  Steady  State  Indexed  Sequential  Files 


^  Hash  based  file  experiments  had  R=30,  Nq-4B0,  and  Z=16.  Indexed  Aggregate  and  in¬ 
dexed  sequential  file  experiments  bad  R=5,  Nq=4B,  and  Z=I6.  Note  that  each  row  of  these 
tables  lists  predicted  sind  simulation  observed  peirameter  values  after  every  16  record 
insertion-deletion  pairs. 
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row 

theory 

H 

sim. 

H±.05 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

Pfulli.Ol 

theory 

Pmer 

sim. 

Pmer±.02 

0 

10.00 

10.00 

.00 

.00 

1.00 

1.00 

.00 

.00 

1 

7.05 

7.16 

.18 

.20 

.20 

.26 

.51 

.52 

2 

6.80 

6.87 

.18 

.21 

.11 

.15 

.53 

.53 

3 

6.76 

6.82 

.17 

.19 

.09 

.11 

.52 

.52 

4 

6.76 

6.78 

.17 

.18 

.09 

.10 

.52 

.52 

5 

6.76 

6.81 

.17 

.17 

.08 

.10 

.51 

.51 

6 

6.76 

6.79 

.17 

.17 

.08 

.09 

.51 

.51 

7 

6.76 

6.78 

.16 

.17 

.08 

.08 

.51 

.51 

8 

6.76 

6.77 

.16 

.17 

.08 

.09 

.51 

.52 

Table  3.8.  A  Validation  Study  of  Steady  State  B+  Tree  Files 


row 

theory 

H 

sim. 

H±.04 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

Pfull±.01 

theory 

Pmer 

sim. 

Pmer±.02 

0 

10.00 

10.00 

.00 

.00 

1.00 

1.00 

.00 

.00 

1 

6.76 

6.79 

.22 

.22 

.34 

.34 

.65 

.65 

2 

6.48 

6.48 

.20 

.21 

.14 

.13 

.63 

.63 

3 

6.64 

6.64 

.16 

.16 

.08 

.08 

.54 

.54 

4 

6.87 

6.87 

.13 

.13 

.09 

.09 

.47 

.47 

5 

7.03 

7.04 

.11 

.11 

.10 

.10 

.42 

.42 

6 

7.13 

7.11 

.10 

.11 

.12 

.11 

.40 

.40 

7 

7.17 

7.17 

.10 

.11 

.13 

.13 

.40 

.39 

8 

7.18 

7.18 

.11 

.10 

.14 

.13 

.40 

.40 

Table  3.9.  A  Validation  vStudy  of  Steady  Growth  B+  Tree  Files 


-rg - - — 

±  values  indicate  the  approzimate  size  of  a  .95  level  confidence  interval. 

All  experiments  had  R=10,  AToslSO.  and  Z=16.  Note  that  each  row  of  Table  3.0  lists 
predicted  and  simulation  observed  parameter  vfJucs  after  every  16  record  insertion- 
deletion  pzdrs.  For  Table  3.9,  each  row  lists  values  after  every  16  record  insertions. 
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revealed  similar  accuracies. 


For  the  case  of  B+  tree  files,  only  69%  of  the  theoretical  values  were  within 
the  .95  level  confidence  intervals  of  the  observed  values.  The  decrease  in  accu¬ 
racy  is  due  to  the  approximations  of  h(j,d)  and  g(j.x.d)  (eqns.  (3.12)  and  (3.14)). 
However,  note  that  the  theoretical  approximations  of  H  were  usually  within  2%  of 
the  observed  values,  and  within  ±.02  of  the  Pou,  PfuU,  and  Pmer  values.  Hence, 
the  approximations  of  h(j,d)  and  g(j,x.d)  seem  adequate.  Table  3.9  shows  results 
of  a  related  series  of  experiments  where  accurate  estimates  of  h(j,d)  and  g(j,x,d) 
were  not  needed.  In  these  experiments,  the  theoretical  values  were  within  the 
confidence  intervals  of  the  observed  values. 

Although  Z,  the  number  of  base  file  nodes,  appears  to  be  a  principle  param¬ 
eter  in  simulation  studies  and  theoretical  calculations,  it  was  observed  that  file 
pairaimeters  assume  asymptotic  values  for  small  Z  when  each  node  of  a  file 
experiences,  on  the  average,  the  same  number  of  insertions  and  deletions.  In 
other  words,  parameter  values  calculated  for  a  small  file  are  negligibly  different 
from  those  of  a  large  file.  Typical  differences  are  illustrated  in  Tables  3.10- 
3.14  where  theoretical  values  of  a  file  with  16  nodes  are  compared  with  simula¬ 
tion  observed  values  of  a  file  with  160  nodes. 

Owing  to  these  similarities,  statistics  for  extremely  large  files  can  be 
obtained  easily.  The  ability  to  collect  such  statistics  efficiently  is  very  impor¬ 
tant  because  the  cost  of  simulating  leu-ge  files,  or  performing  theoretical  calula- 

There  is  some  theoretical  evidence  to  support  these  observations.  It  is  well  known  that 
the  Poisson  density  fmetion  accurately  approximates  the  binomial  distribution  function. 

BfN.r),  a  node  occupancy  distribution  of  a  hash  based  file  derived  in  Section  3.2. 1,  can  there¬ 
fore  be  approximated  by: 

B{N,r)  =  Ze-^'^{N/ZY/T\ 

But  for  the  scalini'  factor  Z.  the  distribution  is  governed  by  the  average  number  of  records 
per  node  N/Z.  Since  the  vedues  of  all  file  parameters,  with  the  exception  of  N  and  Z.  do  not 
depend  on  the  scaling  factor,  statistics  for  small  files  will  be  identical  to  those  of  large  files 
whenever  the  files  have  the  same  h’/Z  ratio. 
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row 

theory 

H 

sim. 

H±.04 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

Pfull  ±.01 

theory 

n 

sim. 

n±.oi 

0 

27.89 

27.80 

.45 

.44 

.53 

.51 

.28 

.30 

1 

27.57 

27.49 

.50 

.49 

.30 

.30 

.33 

.36 

2 

27.39 

27.30 

.53 

.52 

.24 

.25 

.36 

.39 

3 

27.25 

27.16 

.56 

.54 

.21 

.21 

.38 

.41 

4 

27.14 

27.07 

.58 

.56 

.20 

.20 

.39 

.42 

5 

27.04 

26.99 

.60 

.58 

.19 

.20 

.41 

.43 

6 

26.96 

26.90 

.61 

.60 

.18 

.19 

.42 

.44 

7 

26.89 

26.82 

.62 

.61 

.17 

.18 

.43 

.46 

8 

26.83 

26.75 

.64 

.63 

.17 

.17 

.44 

.47 

Table  3.10.  An  Approximation  Study  of  Steady  State  Hash  Based  Files 


row 

theory 

H 

sim. 

H±.02 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

Pfull±.01 

theory 

n 

sim. 

n±.03 

0 

3.00 

3.00 

.00 

.00 

.00 

.00 

.00 

.00 

1 

2.94 

2.93 

.05 

.05 

.14 

.15 

.03 

.03 

2 

2.81 

2.79 

.12 

.13 

.18 

.18 

.10 

.11 

3 

2.71 

2.70 

.17 

.17 

.19 

.19 

.17 

.19 

4 

2.63 

2.62 

.20 

.20 

.19 

.19 

.23 

.24 

5 

2.58 

2.57 

.22 

.22 

.19 

.20 

.27 

.28 

6 

2.54 

2.52 

.23 

.24 

.19 

.21 

.30 

.33 

7 

2.51 

2.49 

.24 

.25 

.19 

.19 

.33 

.35 

8 

2.49 

2.56 

.25 

.26 

.19 

.19 

.34 

.38 

Table  3.11.  An  Approximation  Study  of  Steady  State  Indexed  Aggregate  Files 


row 

« 

theory 

H 

sim. 

H±.02 

theory 

Pou 

sim. 

Pou±.01 

theory 

Pfull 

sim. 

Pfull±.01 

theory 

n 

sim. 

n±.03 

0 

3.00 

3.00 

.00 

.00 

.00 

.00 

.00 

.00 

1 

2.94 

2.93 

.05 

.05 

.14 

.16 

.03 

.03 

e 

2.81 

2.79 

.12 

.13 

.18 

.18  1 

.10 

.11 

3 

2.70 

2.68 

.17 

.17 

.18 

.18 

.18 

.20 

4 

2.81 

2.60 

.20 

.20 

.17 

.18 

.25 

.27 

6 

2.54 

2.53 

.22 

.22 

.17 

.17 

.31 

.33 

6 

2.49 

2.46 

.24 

.24 

.16 

.17 

.36 

.39 

7 

2.45 

2.42 

.25 

.25 

.15 

.15 

.40 

.44 

8 

2.42 

2.37 

.26 

.27 

.15 

.15 

.43 

.48 

Table  3.12.  An  Approximation  Study  of  Steady  State  Indexed  Sequential  FUes 


For  hash  based  file  experiments,  theoretical  values  were  predicted  for  R=30,  N 
Z=16:  simulation  values  were  observed  for  R=30.  N q=^Q0,  Z=160.  For  indexed  aggregate 
and  indexed  sequential  file  experiments,  theoretical  values  were  predicted  for  R=5,  ^o=40. 
Z=16:  simulation  values  were  observed  for  R=5,  N q=480,  Z=160.  Note  that  each  row  of  these 
tables  lists  theoretical  values  after  16  record  insertion-deletion  p2drs;  simulation  values  eifter 
160  pairs 
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row 

theory 

H 

sim. 

H±.02 

theory 

Pou 

sim. 

Pou±.00 

theory 

Pfull 

sim. 

PfuU±.00 

theory 

Pmer 

sim. 

Pmer±.00 

0 

10.00 

10.00 

.00 

.00 

1.00 

1.00 

.00 

.00 

1 

7.05 

7.14 

.18 

.19 

.20 

.23 

.51 

.51 

2 

6.80 

6.89 

.18 

.19 

.11 

.14 

.53 

.52 

3 

6.76 

6.83 

.17 

.18 

.09 

.11 

.52 

.51 

4 

6.76 

6.84 

.17 

.17 

.09 

.10 

.52 

.50 

5 

6.76 

6.82 

.17 

.17 

.08 

.10 

.51 

.51 

6 

6.76 

6.80 

.17 

.17 

.08 

.09 

.51 

.51 

7 

6.76 

6.79 

.16 

.17 

.08 

.09 

.51 

.51 

8 

6.76 

6.79 

.16 

.17 

.08 

.10 

.51 

.50 

Table  3.13.  An  Approximation  Study  of  Steady  State  Balanced  Tree  Files 


row 

theory 

H 

sim. 

H±.01 

theory 

Pou 

sim. 

Pou±.00 

theory 

Pfull 

sim. 

Pfulli.OO 

theory 

Pmer 

sim. 

Pmeri.OO 

0 

10.00 

10.00 

.00 

.00 

1.00 

1.00 

.00 

.00 

1 

6.76 

6.83 

.22 

.21 

.34 

.36 

.65 

.63 

2 

6.48 

6.50 

.20 

.20 

.14 

.14 

.63 

.63 

3 

6.64 

6.66 

.16 

.16 

.08 

.09 

.54 

.54 

4 

6.87 

6.85 

.13 

.13 

.09 

.09 

.47 

.47 

5 

7.03 

7.02 

.11 

.11 

.10 

.10 

.42 

.42 

6 

7.13 

7.11 

.10 

.11 

.12 

.12 

.40 

.41 

7 

7.17 

7.18 

.10 

.10 

.13 

.14 

.40 

.40 

8 

7.18 

7.16 

.11 

.11 

.14 

.14 

.40 

■  .41 

Table  3.14.  An  Approximation  Study  of  Steady  Growth  Balanced  Tree  Files 


±  values  indicate  the  approximate  size  of  a  .95  level  confidence  interval. 

Theoretical  values  were  predicted  for  R=10,  Nq-160,  Z=16;  simulation  values  observed 
for  R=10,  A'o=1600,  Z=160.  Note  that  each  row  of  of  Table  3.13  lists  theoretical  values  after 
16  record  insertion  pairs;  simulation  values  after  160  pairs.  For  Table  3.14,  theoretical 
values  are  listed  edter  16  record  insertions:  simulation  values  after  160  insertions. 
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tion?  on  large  fites  is  prohibitive.  F^irthcrmore,  these  sirrhlarities  make  it  possi¬ 
ble  to  generate  tables  of  statistics  ■whose  values  could  be  used  in  file  perfor¬ 
mance  calculations.  Such  tables  are  called  /if 0  design  tables. 

3.3.3  File  Design  Tables 


A  file  design  table  is  identified  "with  a  particular  file  structure  and  three 
values  that  characterize  the  structure’s  initial  configuration.  These  values  are: 
1)  R,  the  record  capacity  of  a  primary  block;  2)  D/1,  the  relative  intensity  of 
deletions  to  insertions.  D/I=l  models  a  steady  state  file  and  equation  (3.18) 
applies.  Equation  (3.19)  is  appropriate  Vv'hen  D/I=.b,  and  equation  (3.15) 
corresponds  to  D/I=0;  .and  3)  N the  initial  number  of  records  per  node. 

For  a  specific  initial  file  configuration,  a  file  design  table  lists  the  values  of 
selected  file  parameters  at  specific  points  in  time.  In  this  '.vay,  the  evolution  of 
parameter  values  is  recorded.  If  no  table  exists  for  a  particular  file 
configuration,  desired  values  may  be  obtained  by  interpolation. 

File  design  tables  for  hash  based,  indexed  aggregate,  indexed  sequential, 
and  B-f  trees  are  given  in  Appendix  II.  These  tables  are  not  comprehensive; 


their  purpose  is  to  sho'w  how  extensive  design  tables  may  be  compiled. 

Row  0  of  each  table  lists  selected  initial  file  parameter  values.  Subsequent 
rows  list  the  expected  values  of  these  parameters  after,  on  the  average,  a  single 
record  has  been  inserted  into  each  node  of  the  file.  That  is,  the  average  number 
of  insertions  per  node  is  indexed  by  row  numbers.  The  number  of  deletions  per 
node  which  have  occurred  concurrently  -with  each  insertion  is  governed  by  D/I. 
For  example,  if  D/l=.5,  row  7  lists  parameter  values  after  an  average  of  7  inser¬ 
tions  and  3.5  deletions  have  occurred  per  node.  For  files  where  overfiov.'-  occurs, 
the  value  of  G  for  row  j  is  obtained  using: 
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G  =  (expected  number  of  records  per  node)  -  H 
=  {Nq/Z  +  (l-iD//)xj)-H 

where  Nq/Z  and  D /I  are  used  to  identify  the  design  table,  and  the  value  of  H  is 
taken  from  row  j  of  the  table. 

To  illustrate  a  use  of  these  tables,  consider  the  following  problem: 

A  hash  based  file  initially  contains  36000  records  stored  in  1200  nodes. 
The  primary  block  capacity  of  each  node  is  30  records.  If  the  file 
undergoes  4800  insertions  and  2400  deletions  over  the  next  month, 
what  are  the  initial  and  final  values  of  K,  G,  Pou,  Pfull,  and  G? 

From  the  problem  statement,  R=30,  and  D/I=.5.  Using  Table 

H.ll,  initial  parameter  values  are  extracted  from  row  0;  the  final  values  are 
extracted  from  row  4800/1200=4: 

H  G  Pou  Pfull  n 

initial  27.89  2.11  .45  .53  .28 

final  28.34  3.66  .65  .43  .52 

With  the  aid  of  file  design  tables  and  the  cost  functions  of  the  simple  file 
model,  the  performance  evolution  of  a  file  can  be  predicted.  In  Section  5.2,  such 
an  approach  is  used  to  solve  the  combined  problems  of  optimal  file  designs  and 
reorganization  points. 

3.4  Remarks  on  Implementation  and  Complexity  Issues 

In  preceding  sections,  we  presented  a  model  of  file  evolution  that  syn¬ 
thesized  a  number  of  formerly  disparate  results  ([Van73],  [KeLa74],  [SeDu76], 
[NaMi78]).  The  model  was  shown  to  be  suflficiently  genered  to  explain  and  predict 
the  dynamic  behavior  of  file  structures  which  had  not  been  analyzed  previously 
(eg.,  indexed  sequentied).  We  conclude  this  chapter  with  some  comments  on 
implementation  and  complexity  issues. 
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As  mentioned  before,  matrix  multiplications  were  not  used  in  actual  calcu¬ 
lations.  Instead,  iterations  over  insertion  eind  deletion  equations  according  to  a 
prescribed  sequence  of  insertions  and  deletions  were  used.  Node  occupancy  dis¬ 
tributions  T(s,y,d)  for  indexed  aggregate  and  indexed  sequential  files  were 
stored  in  a  3-dimensional  single  precision  array  ARRAY ^y  ,£  with  index  ranges 
O^x^R,  O^y^Y,  and  O^d^max.  Y  and  Dmax  were  chosen  so  that  the  following 
inequedity  was  satisfied  at  all  times: 

R  y  Dmaz 

EES  (x+y)T{x.y.d)  >  .9995N 

*=0  v=0  d=0 

Choosing  values  of  Y  and  Dmax  that  are  too  small  introduces  errors  in  parame¬ 
ter  value  predictions.  Two  dimensional  arrays  ARRAY and  ARRAY^d  were  used 
to  store  hash  based  distributions  T(x,y)  and  balanced  tree  distributions  T(x,d). 

Table  3.15  lists  the  computational  aind  storage  complexities  of  algorithms 
that  update  node  occupancy  distributions  according  to  the  insertion  and  dele¬ 
tion  equations  of  the  previous  sections.  Also  listed  are  the  complexities  of  algo¬ 
rithms  for  generating  file  design  tables  requiring  a  toted  of  I  insertions  and  D 
deletions. 


24 

N  is  obisdned  independently  of  T(x,y.d)  by  knowing  the  initial  file  size  and  incrementing 
and  decrementing  N  according  to  the  sequence  of  record  insertions  and  deletions. 
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file 

insertion 

complexity 

deletion 

complexity 

storage 

complexity 

design  table 
complexity 

hash  based 

0(RY) 

0(RY) 

0(RY) 

O(IRY-l-DRY) 

indexed  siggregate 

O(RYD) 

O(RYD) 

O(RYD) 

0(IRYD+RYZ12) 

indexed  sequential 

O(RYD) 

O(RYD) 

O(RYD) 

0(IRYD+RY£|2) 

balanced  tree 

0(Z)^+DR) 

0(RZ13) 

0(RD) 

0(LD2+IDR+RZ1'‘) 

Table  3.15.  Complexities  of  File  Evolution  Algorithms 

Judging  from  the  above  entries,  generating  file  design  tables  is  a  formidable 
task,  even  when  small  files  are  considered.  For  individual  tables,  execution 
times  in  excess  of  20  minutes  on  a  PDF- 11/45  were  not  uncommon.  Extensive 
simulation  studies  typicedly  required  only  a  few  minutes.  In  general,  unless 
explicit  functions  for  node  occupancy  distributions  are  available,  simulation  is  a 
more  efficient  means  of  generating  file  design  tables.  Further  research  is  neces¬ 
sary  to  make  the  computations  more  efficient. 
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CHAPTER  4.  A  MODEL  OF  LINKSETS 


A  linkset  is  a  structure  that  connects  records  of  one  file  to  those  of  another. 
Classical  linkset  structures  include  parent  pointers,  inverted  lists,  ring  lists,  and 
cellular  multilists. 

In  this  chapter,  a  model  of  liriksels  is  presented.  We  begin  by  defining  a  uni¬ 
fying  model  of  linkset  structures.  We  then  identify  basic  operations  that  are 
performed  on  linksets,  and  develop  expressions  that  estimate  the  cost  of  exe¬ 
cuting  these  operations.  Storage  requirements  of  linkset  implementations  are 
also  examined. 

4.1  Structure  of  a  Linkset 

A  linkset  is  a  structure  that  expresses  a  relationship  between  two  (possibly 
different)  simple  files.  One  simple  file  is  the  parent  file,  the  other  is  the  child 
file.  Since  multiple  relationships  may  exist  between  files,  files  may  be  con¬ 
nected  by  a  number  of  linksets. 

The  basic  component  of  a  linkset  is  a  linkset  occurrence  which  consists  of  a 
single  parent  record  and  zero  or  more  child  records.  For  any  linkset,  each 
parent  record  is  assigned  to  a  distinct  linkset  occurrence,  and  each  child  record 
is  a  member  of  at  most  one  linkset  occurrence. 

Every  parent  record  possesses  a  distinct  key  called  a  link  key.  All  child 
records  possess  link  keys,  but  these  keys  need  not  be  distinct.  A  linkset 
occurrence  consists  of  a  parent  record  and  all  child  records  that  share  identical 
link  keys.  The  linkset  occurrence  of  Figure  4.1  is  composed  of  a  parent  record 
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Figure  4.1  Sal ient  Features  of  a  Linkset  Occurrence 


and  child  records  whose  link  key  is  ’'Toronto”. 


Linkset  implementations  exploit  a  partitioning  of  the  child  file  into  one  or 
more  subfiles  called  cells.  Typically,  a  cell’s  size  is  chosen  to  reflect  some  con¬ 
venient  hardware  memory  boundary  (eg.,  track  or  cylinder).  Linkset 
occurrences  are  implemented  in  part  by  a  cell  directory,  which  is  contained  in 
every  parent  record.  A  cell  directory  is  an  array  of  pointers  that  point  to  dis¬ 
tinct  cells.  ^  Each  identified  cell  contains  one  or  more  of  the  linkset 
occurrence’s  child  records.  Such  a  cell  is  said  to  be  occupied.  A  cell  directory 
pointer  may  correspond  to  the  starting  address  of  a  cell  (where  child  records 
are  identified  by  their  content),  or  it  may  correspond  to  the  head  of  a  list  of 
child  records  within  the  cell.  These  are  the  serial  and  list  implementations  of 
linksets,  respectively.  Figure  4.1  illustrates  the  salient  linkset  features  of  a  cel¬ 
lular  multilist  structure. 

Optional  features  of  linkset  implementations  include  parent  pointers,  a  key 
ordering  of  child  records,  doubly  linked  lists,  and  ring  lists  (Fig.  4.2).  The  latter 
two  options  apply  only  to  list  implementations  of  linksets.  It  is  also  possible  for 
link  keys  to  be  implicit  in  child  records.  That  is,  linksets  are  iTiformaiiori  carry¬ 
ing  ([TsLo77])  if  link  keys  are  implicitly  present  in  child  records,  otherwise  link- 
sets  are  noninformation  carrying.  The  linkset  of  Figure  4.1  is  noninformation 
carrying. 

4.1.1  Parameterization  of  Linksets 

Linkset  parameters  describe  available  design  options  of  linkset  implemen¬ 
tations  and  characterize  child  record  populations  of  linkset  occurrences. 
Definitions  of  these  parameters  are  listed  in  Table  4.1.  A  linkset  structure  is 

described  by  the  values  assigned  to  these  parameters.  This  collection  of  values 

^  The  term  cell  directory  is  generic;  common  names  for  a  cell  director)’^  include  "inverted 
list",  "pointer  array",  "multilist",  and  "cellular  multilist". 
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Parent  Pointers 


Ring  List 


Doubly  Linked  List  with  Ring  List  Option 


Doubly  Linked  List  Without 
Ring  List  Option 


A  Key  Ordering  of  Child  Records 


Figure  4.2  Some  Implementation  Options  of  Linksets 


84 


LinJcset  Design  Parameters 


Parent 

Child 

SI 

Cs 

Cc 

Pp 

Ko 

R1 

D1 

Ic 


parent  file  of  linkset 
child  file  of  linkset 

serial  (=0)  or  List  (=1)  implementation  of  linkset 

cell  size:  single  record  (=0),  integral  number  of  nodes  (=1),  or  entire  file  (=2) 
cell  size  in  nodes  if  Cs=l,  default  value  is  0 
parent  pointers  * 

child  records  linked  in  ascending  or  descending  logical  valued  key  order  * 

ring  lists  * 

doubly  linked  lists  * 

linkset  is  information  carrying  * 


Linkset  Occupancy  Parameter 

W  expected  number  of  child  records  per  linkset  occurrence 

*  value  is  1  if  option  is  used,  0  otherwise 


Table  4.1.  Parameters  of  a  Linkset  Descriptor 
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is  called  the  linkset’s  descriptor. 


Not  all  value  assignments  to  linkset  design  parameters  describe  implemen¬ 
tations  that  can  be  or  should  be  realized.  Those  assignments  that  correspond  to 
recognized  linkset  structures  are  listed  in  Table  4.2. 


Generic  Structure 

Name 

SI 

Cs 

Generic  Structure  Descriptor  Values 
Cc  Rl  D1  Ko  Ic 

Pp 

inverted  list 

0 

0 

0 

0 

0 

0 

0 

0 

DBTG  set,  pointer  array 

0 

0 

0 

0 

0 

• 

« 

* 

cellular  serial 

0 

1 

0 

0 

0 

0 

0 

generalized  cellular  serial 

0 

1 

0 

0 

0 

0 

* 

cellular  multilist 

1 

1 

0 

0 

0 

0 

0 

generalized  cellular  multilist 

1 

1 

k 

0 

* 

» 

* 

multilist 

1 

2 

0 

0 

0 

0 

0 

0 

generalized  multilist 

1 

2 

0 

0 

* 

* 

» 

* 

DBTG  set,  ring  list 

1 

2 

0 

1 

• 

* 

* 

* 

Note:  'k  =  number 

of  nodes  per  cell; 

♦  — 

value 

is  1 

if  option  used,  0 

otherwise. 

Table  4.2  A  Catalog  of  Linkset  Structures 

Linksets  cannot  be  implemented  between  any  pair  of  files.  Since  linksets 
rely  on  physical  address  pointers,  all  pointers  to  a  record  may  have  to  be 
updated  whenever  the  record  is  moved.  (The  exception  is  when  linksets  are 
implemented  in  a  cellular  serial  fashion  where  records  can  be  repositioned 
freely  within  a  cell.)  Since  it  is  preferable  to  update  as  few  pointers  as  possible, 
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the  child  file’s  descriptor  should  have  Split=0  and,  if  cellular  serial  linksets  are 
not  used,  Ascend=0.  A  similar  situation  exists  for  parent  files.  If  a  linkset  uses 
parent  pointers  (Pp=l)  or  ring  lists  (Rl=l),  the  parent  file's  descriptor  should 
have  Split=0  and  Ascend=0. 

4.1.2  Examples 

An  inverted  file  is  a  network  of  two  or  more  simple  files,  one  of  which  is  the 
datafile,  and  the  remaining  are  index  files.  The  topology  of  an  inverted  file  has 
each  index  file  connected  to  the  data  file  by  a  single  linkset,  where  the  index  file 
is  the  parent  file  of  a  linkset.  A  characteristic  of  inverted  files  is  that  linksets 
are  implemented  serially  (S1=0)  with  cells  containing  single  records  (Cs=0).  Fig¬ 
ure  4.3  illustrates  two  linksets  of  an  inverted  file. 

Multilist  and  cellular  multilist  files  are  distinguished  from  inverted  files  in 
the  way  linksets  are  implemented.  Multilist  files  implement  linksets  as  lists 
(Sl=l)  where  the  data  file  resides  in  a  single  cell  (Cs=2).  Cellular  multilists  also 
implement  linksets  as  lists,  but  a  number  of  cells  are  used  (Cs=l).  Suppose  the 
cell  size  of  the  cellular  multilist  structure  of  Figure  4.1  is  'F  nodes.  The  descrip¬ 
tor  of  this  linkset  is: 

Linkset  Parent  Child 

Descriptor _ File _ File _ SI  Cs  Cc  Pp  Ko  R1  D1  Ic  W 

Cellular  Multilist  CITY  LOCATION  11^0  0  0004 

Figure  4.4  illustrates  representative  linkset  occurrences  of  a  multilist  file. 

A  generalized  access  path  structure  (GAPS)  is  a  file  that  serves  as  an  index 
file  to  two  or  more  data  files  ([Haer78]),  Like  most  index  files,  a  GAPS  file  is  con¬ 
nected  to  each  data  file  by  a  single  linkset.  Figure  4.5  shows  a  typical  inverted 
list  or  pointer  array  implementation  of  the  linksets  of  a  GAPS  file.  Note  that 
other  linkset  implementations  are  possible. 
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As  additional  examples,  a  data  striictnre  diagram  of  a  hospital  database  is 
shown  in  Figure  4.6,  where  a  box  represents  a  simple  file  and  an  arc  represents  a 
iinkset.  (Note  that  arcs  are  drawn  from  the  parent  file  to  the  child  file.)  A 
representative  Iinkset  occurrence  for  each  Iinkset  of  the  hospital  database  is 
displayed  in  Figure  4.7,  along  with  the  descriptor  of  each  Iinkset.  Notice  the 
OCCUPANCY  Iinkset  is  implemented  as  a  ring  list  with  parent  pointers, 
PATIENT_ASSIGN  as  a  doubly  linked  list,  and  DOCTOR_ASSIGN  as  a  pointer  array 
with  parent  pointers. 

It  is  worth  noting  that  the  hospital  database  of  Figure  4.7  could  be  an  imple¬ 
mentation  of  a  DBTG  database  [C0DA78]  or  a  relational  database  [Astr76]. 

1 

4.2  Operations  on  Linksets 

Operations  on  linksets  are  actions  directed  toward  one  or  more  records  of 
a  specified  Iinkset  occurrence.  An  integral  step  in  the  execution  of  these  opera¬ 
tions  is  to  locate  desired  records.  We  begin  our  discussion  of  Iinkset  operations 
with  a  review  of  basic  relationships  between  queries  and  Iinkset  search  stra¬ 
tegies. 

4.2.1  Queries  and  Iinkset  Searching  Strategies 

Two  types  of  retrieval  operations  arc  associated  with  linksets.  One  involves 
the  retrieval  of  child  records  of  a  given  parent  record;  the  other  involves  the 
retrieval  of  a  parent  record  of  a  given  child  record. 

Queries  that  qualify  the  retrieval  of  child  records  of  a  parent  record  are 
child  queries.  Two  common  strategies  for  retrieving  child  records  are  the  link- 
set  scan  and  Iinkset  partial  scan.  A  Iinkset  scan  accesses  all  child  records  of  a 
Iinkset  occurrence;  a  Iinkset  partial  scan  searches  for  r  designated  child 
records.  The  latter  strategy  is  similar  to  the  Iinkset  scan,  except  that  the 
search  terminates  as  soon  as  the  r  records  have  been  retrieved.  Linkset  partial 
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Figure  4.6  A  Data  Structure  Diagram  of  a  Hospital  Database 
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scans  are  used  to  process  child  queries  that  specify  an  identifier  for  each 
requested  child  record;  linkset  scans  process  all  other  child  queries.  Summar¬ 
ized  in  Table  4.3  are  child  query  classifications  and  the  recommended  process¬ 
ing  strategy  for  each  classification. 

Each  Conjunction  of  a  Child  Query  Linkset 

Contains  a  Clause  of  the  Form_ Search  Strategy 

(identifier  =  value)  linkset  partial  scan 

otherwise  linkset  scan 

Table  4.3.  A  Child  Query  -  Linkset  Search  Strategy  Relationship 

Prior  to  child  record  retrieval,  an  important  query  processing  technique 
may  be  used.  In  the  case  of  inverted  files  and  inverted  lists,  intersection  euid 
union  operations  determine  the  response  set  of  a  query  quickly.  Analogous 
operations  are  used  in  processing  queries  with  multilists  and  cellular  multilists. 
In  the  context  of  linksets,  the  intersection  and  union  of  lists  corresponds  to  the 
intersection  and  union  of  cell  directories. 

The  result  of  an  intersection  or  union  of  cell  directories  is  another  cell 
directory.  In  order  for  the  intersection  and  union  operations  to  be  meaningful, 
the  participating  cell  directories  must  belong  to  linksets  which  share  a  common 
design.  That  is,  all  cell  directories  are  inverted  lists,  or  all  are  multilists,  etc.  It 
is  not  meaningful  to  intersect  a  cell  directory  of  an  inverted  file  with  a  cell 
directory  of  a  multilist  file. 

A  child  query  may  be  described  by  the  linkset  seeirch  strategy  used  to  pro¬ 
cess  it,  and  by  estimates  of  the  number  of  child  records  in  a  linkset  occurrence 
that  satisfy  the  query.  Such  information  is  specified  by  values  assigned  to  those 
parameters  whose  definitions  are  listed  in  Table  4.4.  This  collection  of  values 
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defines  the  child  query's  descriptor.  Methods  of  estimating  values  of  a  child 
query  descriptor  are  presented  in  Section  4.3.1. 

Lss  linkset  search  strategy  used  to  process  a  child  query:  linkset  scan 
or  linkset  partial  scan 

f  selectivity  of  a  child  query:  expected  fraction  of  the  child  file  that 
satisfies  the  child  query  and  belongs  to  a  designated  linkset 
occurrence 

ef  exact  selectivity  of  a  child  query:  exact  fraction  (or  upper  bound  to 
the  exact  fraction)  of  the  child  file  that  satisfies  the  child 
query  and  belongs  to  a  designated  linkset  occurrence 

k  average  number  of  occupied  cells  distinguished  by  a  designated 
linkset  occurrence 

w  average  number  of  child  records  in  an  occupied  cell 

Table  4.4.  Parameters  of  a  Child  Query  Descriptor 

Queries  that  qualify  the  retrieval  of  a  parent  record  of  a  given  child  record 
are  parent  queries.  A  parent  query  may  be  described  by  the  value  assigned  to 
the  parameter  Pq.  Pq  is  the  probability  that  a  given  child  r'ecord  has  a  parent 
record  and  that  the  parent  record  satisfies  the  parent  query.  A  method  of 
estimating  the  value  of  Pq  is  presented  in  Section  4.3.3. 

4.2.2  Basic  Operations 

Basic  linkset  operations  involve  child  and  parent  record  retrieval,  and  a 

linking  and  unlinking  of  child  records  from  a  designated  linkset  occurrence. 

% 

These  operations  can  be  envisioned  as  procedures  which  return  0  or  more 
records  as  their  output.  Functions  that  characterize  these  operations  are  cost 
functions  and  response  set  size  functions.  In  the  follovdng  paragraphs,  we  will 
identify  four  linkset  operations  and  define  their  characteristic  functions. 

Let  L  be  a  linkset,  C-Query  be  a  child  query,  P-Query  be  a  parent  query, 
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celL_dir  be  a  cell  directory,  and  p_rec  and  c_rec  be  a  parent  record  and  a  child 
record.  A  general  format  of  a  child  record  retrieval  operation  is: 


RETRJEVE_CKILD  cell_dir 


(A  1  •  ’  •  Ajt ) 


WHERE  C-Query 


VIA  L 


HOLD 


where  A  i  ‘  are  attributes  whose  values  are  extracted  from  each  child 

record  identified  by  cell _ dir  that  satisfies  C-Query.  The  values  extracted  from  a 

child  record  forms  an  output  record,  and  the  collection  of  all  output  records 
forms  an  output  file.  Duplicate  output  records  are  retained.  ^ 

A  general  format  of  a  parent  record  retrieval  operation  is: 


RETRIEVE_PARENT  c_rec 


(A  1  •  •  •  Ajt) 


WHERE  P-Query 


VIA  L 


HOLD 


where  A\'  •  -  At  are  attributes  whose  values  are  extracted  from  c_rec’s  parent 
record  if  the  parent  record  satisfies  P-Query.  The  values  extracted  from  the 
parent  record  forms  an  output  record. 

For  both  the  RETRIEVE_CHILD  and  RETRIEVE_PARENT  formats,  square 
brackets  enclose  phrases  that  need  not  be  specified.  Omitting  i  ’  ’  ' 
implies  all  attribute  values  are  desired;  omitting  "WHERE  C-Quer}'-"  or  "WHERE 
P-Query"  implies  that  the  parent  record  of  c_rec  or  all  child  records  identified 

by  cell _ dir  are  to  be  output.  Specifying  HOLD  indicates  that  the  records  which 

are  retrieved  may  be  subject  to  modification  or  deletion  ([Date??]).  A  more 
detailed  discussion  of  HOLD  appears  in  Section  5.3. 

A  LINK  operation  inserts  a  child  record  into  a  linkset  occurrence.  An 
UNLINK  operation  removes  a  child  record  from  a  linkset  occurrence.  A  general 
format  of  a  LINK  and  UNLINK  operation  is: 


n 

Because  cell  directories  may  be  formed  fromi  the  union  and  intersection  of  other  cell 
directories,  it  is  not  always  possible  to  identify  a  cell  directory  with  a  parent  record.  For  this 
reason,  a  cell  directory  is  the  preferred  argument  to  a  RETRIEVE -CHILD,  rather  than  a 
parent  record. 
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LINK  c_rec  TO_PARENT  p_rec  VIA  L 
UNLINK  c_rec  FROM_PARENT  p_rec  VIA  L 

where  c_rec  is  a  record  of  the  Child  file  and  p_rec  is  a  record  of  the  Parent  file 
of  linkset  L.  Prior  to  issuing  a  LINK  or  UNLINK,  a  correspondence  between  a 
user’s  copy  of  c_rec  (p_rec)  and  c_rec  in  Child  (p_rec  in  Parent)  must  be  esta¬ 
blished  (see  Section  5.3).  Both  LINK  and  UNLINK  have  no  output. 

A  LINK  and  UNLINK  is  realized,  in  part,  by  updating  pointers  in  a  user’s  copy 
of  records  c_rec  and  p_rec.  Thus,  any  modifications  to  c_rec  and  p_rec  are 
done  in  main  memory.  To  affect  these  modifications  to  c_rec  in  Child  and  p_rec 
in  Parent,  UPDATES  of  these  records  must  follovv  a  LINK  or  UNLINK.  Conse¬ 
quently,  a  common  sequence  of  operations  is;  ^ 

LINK  c_rec  TO_PARENT  p_rec  VIA  L 
UPD.ATE  c_rec  IN  Child 
UPDATE  p_rec  IN  Parent 

and 

UNLINK  c_rec  FRO\{_PARENT  p_rec  VIA  L 
UPDATE  c_rec  IN  Child 
UPDATE  p_rec  IN  Parent 

Cost  functions  and  response  set  size  functions  accept  arguments  that  may 
include  the  descriptors  L  and  CQ  of  L  and  C-Query,  and  the  value  Pq  (which 
describes  P-Query).  Table  4.5  lists  the  above  linkset  operations  vfith  their 

“q 

Note  that  it  is  possible  for  c_rec  and  p_rec  to  remain  unchanged  after  a  LINK  or  UNIINK. 

For  example,  if  parent  pointers  are  not  used  in  serial  imiplemientations  of  linksets,  c_rec 
renaains  unchanged  by  a  LINK  or  UNLE'IK.  If  cells  contain  an  integral  number  of  nodes  in 
serial  implementations  of  linlcsets,  it  is  possible  for  c_rec  to  be  added  to  p_rec’s  linkset  oc¬ 
currence  without  causing  p_reo's  cell  directory  to  be  modified.  Hence,  UPDATES  of  c_rec 
and  p_ree  may  be  unnecessary. 
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characteristic  functions.  From  the  descriptions  of  the  RETRIEVE_FARENT,  LINK, 
and  UNLINK  operations,  we  know  that  nRETP(Pq)=Pq.  nLINK=0,  and  nUNLK=0. 
Expressions  that  estimate  other  characteristic  functions  are  developed  in  the 
following  sections. 


linkset  operation 


cost 

function 


response  set  size 
function 


RETR1EVE_CH1LD  cell_dir 


WHERE  C-Query 


VIA  L 


RETRIEVE_PARENT  c_rec 


WHERE  C- Query 


VIA  L 


(Ar 
HOLD  j 

(4 1  ■  ■  '  ■ 

HOLD 


-4U 


fc) 


LINK  c_rec  TO^PARENT  p_rec  VIA  L 
UNLINK  c_rec  FROH_PARENT  p_rec  VIA  L 


RETC(L,  CQ  )  nRETC(L,  CQ ) 


RETP(L) 
LINK(L ) 
UNLK(L) 


nRETP(Fq) 

nLTNK 

nUNLK 


Table  4.5  Characteristic  Functions  of  Linkset  Operations 


4.3  Cost  Expressions  for  Linksets 

Developing  cost  expressions  for  linkset  operations  requires  not  only  linkset 
descriptor  parameters,  but  also  base  file  statistics  of  the  child  and  parent  files. 
Table  4.6  lists  the  relevant  statistics.  For  notational  simplicity,  the  base  file  sub¬ 
script  (0)  has  been  eliminated  from  all  quantities.  With  these  paramieters  and 
those  of  Table  4.1,  useful  statistics  can  be  defined.  For  example. 


C  =  number  of  cells  in  the  child  file 


N  ifCs  =  0 

[  Z /Cc  1  if  Cs  —  1 

1  ifCs=2 


and 
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Statistics  of  the  Base  File  of  the  Child  File 


Z  number  of  base  file  nodes 

H  expected  number  of  records  in  a  primary  block 
G  expected  length  of  an  overflow  chain 
K  Ao  primary  and  overflow  block  access  costs 
N  number  of  child  records 

Ro  record  capacity  of  an  overflow  block 

Statistics  of  the  Base  File  of  the  Parent  File 

HH  expected  number  of  records  in  a  primary  block 
GG  average  length  of  an  overflow  chain 
AA,  AAo  primary  and  overflow  block  access  costs 
NN  number  of  parent  records 

Table  4.6.  Statistics  of  a  Parent  and  Child  File 
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Avew  =  average  number  of  child  records  of  a  linkset  occurrence  per 
occupied  ceil 

=  W/C(C.Yy,N) 

•where  (  is  defined  by  equation  (2.2). 

4.3.1  Estimating  Child  Query  Descriptors 

The  parent  record  of  a  linkset  occurrence  contains  a  cell  directory  whose 
contents  facilitate  the  location  of  all  child  records  having  a  link  key  identical  to 
that  of  the  parent  record.  Each  cell  directory  can  therefore  be  related  to  a 
clause  of  the  form  (link  key  =  value).  It  is  this  relationship  that  enables  cell 
directories  to  be  used  in  the  processing  of  child  queries. 

Let  <f,ef,k,w>  be  the  descriptor  of  a  cell  directory  ■where:  1)  f  is  the  selec¬ 
tivity  of  the  clause  (link  key  =  value)  that  is  associated  with  the  cell  directory. 
Since  the  average  num.ber  of  child  records  in  a  linkset  occurrence  (W)  and  the 
total  number  of  child  records  (N)  are  kno'v\’'n,  f=W/N.  2)  ef  is  the  exact  selec¬ 
tivity  of  (link  key  =  value).  ef=f  if  precisely  W  records  satisfy  the  clause,  other¬ 
wise  ef=l.  3)  k  is  the  average  number  of  occupied  cells  in  a  linkset  occurrence, 
where  k=^(C,W,N).  4)  w  is  the  average  number  of  child  records  in  an  occupied 
cell,  where  w=Avew. 

"When  the  union  and  intersection  of  cell  directories  are  formed,  another  cell 
directory  is  produced.  To  estimate  the  descriptors  of  such  cell  directories,  we 
again  assume  the  independence  of  records  satisfying  clauses  of  a  query.  ^  The 
following  are  rules  for  applying  these  estimates; 
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Different  eissumptions  are  considered  in  ([Chr01]). 


"1^1^  AND  ^fz’  2>  ^2>  ’^2^ 


=  <  /1X/2.  (4.1a) 

^/iXe/2.  (4.1b) 

fciXfcg/'C,  (4.1c) 

min(it;i,  W2)  >  (4. Id) 

<fv  e/i.  ki.  Wi>  OR  <fi.  efi.  ki.  wi> 

=  <  /1+/2-/1X/2.  (4.2a) 

e/i+e/a-e/ixe/a.  (4.2b) 

fc  i+fcg— fc  jXfcg/C,  (4.2c) 

Wi+wz-WiXwz^C XN  >  (4. 2d) 


(4.  la-b)  and  (4.2a-b)  follow  directly  from  the  independence  assumption.  When 
two  cell  directories  point  to  different  lists  within  the  same  cell,  only  the  shorter 
list  is  retained,  and  (4.  Id)  follow^s.  ki/'C  and  ki/C  are  the  fractions  of  the  total 
number  of  cells  that  are  occupied  by  records  of  the  two  cell  directories. 
k^xkz/C^  estimates  the  fraction  of  the  total  number  of  cells  that  remain  occu¬ 
pied  after  their  intersection.  The  estimated  number  of  occupied  cells  is  there¬ 
fore  /ciXfcg/C,  and  (4.1c)  follow^s.  (4.2c-d)  follow  from  similar  arguments. 

Since  cell  directories  are  related  to  clauses,  it  is  possible  to  assign  a  cell 
directory  descriptor  to  a  child  query.  This  is  accomplished  by  assigning  a  cell 
directory  descriptor  to  each  clause  of  the  query  and  applying  (4,1)  and  (4.2) 
according  to  the  query’s  AND  and  OR  connectives.  Descriptors  are  assigned  to 
clauses  in  the  following  way.  If  a  cell  directory  is  to  be  used  in  processing  a 
query,  its  corresponding  clause  is  assigned  a  descriptor  whose  values  were 
defined  earlier.  For  each  clause  with  no  cell  directory,  a  default  descriptor  of 
<f,  ef,  C,  N/C>  is  assigned,  where  f  and  of  are  the  selectivities  of  the  clause. 
Note  that  the  sole  purpose  of  introducing  default  descriptors  is  to  ensure  that 
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the  seleoiivitiso  f  and  ef  of  a  child  query  are  properly  estimated. 

Combining  the  values  of  a  child  query’s  cell  directory  descriptor  <f,ef,k,'vr> 

proQBiilng  tirmtegy  Lii  tpttolGtt  th«  valuei  of  tho  ohild  query*!  deeorlpior 

(tee  Table  4.4). 

for  praotioal  ooniiderationi.  It  la  valuable  to  note  that  cell  directory 
descriptors  of  clauses  need  not  be  used  If  It  Is  simpler  to  estimate  the  cell  direc¬ 
tory  descriptor  of  a  child  query  directly. 


4.3.2  RETRIEVfi^CHILD 


Again  applying  standard  techniques,  expressions  that  estimate  the  costs  of 
performing  linkset  operations  can  be  developed  progressively.  Assume  that  edl 
records  of  a  child  file  have  an  equal  probability  of  belonging  to  a  given  linkset 
occurrence.  Suppose  the  r  child  records  of  a  linkset  occurrence  are  to  be 

retrieved.  On  the  average,  ( 


)t  of  these  records  will  reside  in  primary 


blocks,  and 


Q 

{— — —)r  will  be  in  overflow. 


Estimates  of  the  cost  to  access  these 


records  both  directly  and  via  a  list  are: 


DAC(r,b)  =  cost  of  retrieving  r  records  stored  In  b  nodes  given  pointers 
to  these  records  and  that  no  block  is  accessed  more  than 
once. 


=  e(b.  ( 


H 


H  +  G 


)r,  b  xH)A  + 


ZxG 


Ro 


H  +  G 


)t,  ZxG)Ao 


LAC(r)  =  cost  of  following  a  list  of  r  child  records 


A  linkset  scan  searches  k  cells  for  kxw  child  records,  w  records  per  cell. 
The  cost  of  a  linkset  scan  is  approximately: 


LSC  {k,w) 


DAC(k.Z) 

LAC(k) 

kxCcx(A  +AoxG) 
k  xDAC  {w.Z/C ) 
k xLAC  {w ) 


if  51=0,  Cs  =  0,  Ko=Q 
if  51  =  0,  Cs  =  0.  Ko  =  l 
if  51=0,  Cs  =  l 
if  51  =  1,  /c  =  0,  Ko=Q 
otherwise 
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In  the  case  where  Sl=l,  lc=0.  and  Ko=0,  cell  directories  correspond  to  multilists 
or  cellular  multilists  where  child  records  may  be  retrieved  in  any  order.  Since 
the  lists  are  noninformation  carrying,  a  record’s  membership  on  a  list  can  be 
deduced  from  the  record's  content.  By  examining  all  records  of  a  block  and 
identifying  all  child  records  that  would  be  on  the  list,  it  is  possible  to  ensure  that 
a  block  is  accessed  only  once.  This  method  of  list  traversal  is  an  improvement 
of  the  Claybrook-Yang  algorithm  ([CiYa78]). 

A  linkset  partial  scan  searches  for  r  records  in  k  cells  with  w  records  per 
cell.  Setting  r=l,  we  have: 


LPSl(k,w)  =  cost  of  locating  1  record  using  a  linkset  partial  scan 


NPSl(fc>cCc.  0) 

— - )L::>C(k,w ) 

2xfc  xw 


if  SI  =0,  Cs  =  l 
otherwise 


In  the  case  of  S1=0  and  Cs=l,  a  partial  scan  of  kxCc  nodes  locates  the  desired 
record:  ^  in  all  other  cases,  a  linkset  partial  scan  examines  {kxw  +  l)/2  child 
records  on  the  average,  which  accounts  for  a  fraction  of  {kxw  +  l)/{?.xkxw)  of 
the  cost  of  a  linkset  scan. 

The  cost  of  a  linkset  partial  scan  is  approximately: 


LPS(r,k,w)  =  LPSl(k,w)  +  — ^)(LSC(k,w)  -  LPSl(k,w)) 

r  + 1 

Therefore, 


RETC{h,  CQ) 


LSC{k,vj)  if  Lss  -  linkset  scan 

LPS  (ef  xN.k.u))  if  Lss  =  linkset 'partial  scan 


The  number  of  records  that  are  output  by  a  RETRIEVE_CHILD  is  estimated 
by  the  product  of  the  child  query  selectivity  and  child  file  size: 


5 


Is?Sl  is  eveduatcd  using  the  values  of  the  Child  file  descriptor. 


103 


7ii?^7’(L,  CQ)  =  fxN 


4.3.3  RETRIEVE_PARENT 

Let  RP  be  the  cost  of  accessing  a  parent  record  given  a  pointer  to  the 
record: 


RP(L)  =  AA{ 


HH 


HH  +  GG 


)  +  AAo  (G 


HH  +  GG 


-) 


Let  77  be  the  probability  that  a  child  record  does  not  belong  to  a  linkset 
occurrence  (ie.,  the  record  is  not  a  member  of  a  list  of  child  records,  nor  does  it 
have  a  parent  pointer).  In  such  cases,  the  cost  of  a  RETRIEVE_PARENT  is  0, 
With  probability  I-77,  a  child  record  belongs  to  a  linkset  occurrence,  and  the 
cost  of  a  RETRIEVE_PARENT  is  nonzero.  Taking  a  weighted  average  of  these 
costs  yields  the  expected  cost  of  a  RETRIEVE_PARENT.  Assuming  that  all  child 
records  have  an  equal  probability  of  having  a  parent  record,  77  is  taken  to  be 
l-(A^^xTr)/'.Y.  We  find: 


RETP{L) 


,NNxW 
^  N 


)x 


RP{h)  ifPp-l 

LPS  {1,  l.Aveu/) RP  {L)  ifP’7D=0and 

Rl  =  l 

undefined  otheinuisc 


Note  that  if  neither  parent  pointers  or  ring  lists  are  used,  then  the  parent 
record  of  a  child  record  cannot  be  located  via  linkset  L. 

The  probability  that  a  parent  record  is  returned  by  a  RETRIEVE_PARENT  is 
Pq  (defined  in  Section  4,2.2).  Pq  may  be  estimated  in  the  following  way.  Let  fpq 
be  the  selectivity  of  P_Query  of  a  FiETRIEVE_P AREN'T.  That  is,  fpq  is  the  proba¬ 
bility  that  a  record  of  the  parent  file  satisfies  P-Query.  {NN xW)/'N  estimates 
the  probability  that  a  child  record  has  a  parent  record.  Assuming  that  a  parent 
record  satisfies  P-Query  independent  of  its  parent  status,  Pq  is  estirnaLed  by; 
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fj)q  x-NN  X  W  /N 


Pq  = 

The  expected  number  of  records  output  by  a  RETRIEVE_PAEENT  is: 

nRETF  {Pq )  —  Pq 


4.3.4  LINK 


A  LINK  operation  inserts  a  child  record  into  a  linkset  occurrence.  It  is 
assumed  that  the  blocks  of  files  Child  and  Parent  that  contain  the  child  and 
parent  records  c_rec  and  p_rec  are  in  main  memory  when  a  LINK  is  processed. 
In  addition  to  updating  pointers  in  c_rec  and  p_rec,  it  may  also  be  necessary  to 
retrieve  and  update  pointers  of  other  child  records. 


If  Child  and  L  are  descriptors  of  Child  and  L,  the  estimated  cost  of  process¬ 


ing  a  LINK  operation  is: 


LINK{h)  = 


LPS  ( 1,  W,  1 )  if  Ko  =  l  and  Cs  =  0 

LPS  {1,  l.Avsui)  if/ro  =  l  andCs>0 
0  otherwise 


+  L™  ( Child  )x 


0 

0 

2 


l-l/{Avew-^l) 
?.—  2/{Avew  +  l) 


if  = 

=  Ko-0,  Dl=0 

if5’i  =  l,  Ko=Q,  Dl  =  l 
ifSl  =  l,  Ko  =  h  Dl-Q 
if  =  h  Ko  =  \,  Dl-1 


(4. 3a) 


(4.3b) 

(4.3c) 

(4.3d) 

(4.3e) 

(4.3f) 


(4.3a)  estimates  the  cost  of  locating  the  insertion  point  of  record  c_rec 
when  child  records  are  maintained  in  key  order. 

(4.3b)“(4.3f)  estimate  the  cost  of  updating  pointers  of  one  or  more  child 
records  other  than  c_rec.  When  linksets  are  implemented  serially  (4.3b)  or 
when  c_rec  is  inserted  at  the  head  of  an  unordered,  singly  linked  list  of  child 
records  (4.3c),  only  c_rec  and  p_rec  are  updated  and  no  additional  costs  are 
incurred.  In  (4.3d),  c_reo  is  inserted  at  the  head  of  a  doubly  linked  list.  The  fac¬ 
tor  2xf/PZ)  (Child )  accounts  for  the  cost  of  reading  and  updating  the  back- 
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pointer  of  the  child  record  that  immediately  follows  c_rec  on  a  doubly  linked 
list.  (4.3e)  and  (4.3f)  estimate  the  cost  of  updating  pointers  in  one  or  more  child 
records  when  c_rec  is  inserted  into  a  key  ordered  list.  Factors  involving  Avew 
(the  average  length  of  a  list  of  child  records)  account  for  the  possibility  that  one 
of  the  list-adjacent  ®  records  of  c_rec  is  the  parent  record. 

4.3.5  UNLINK 


An  UNLINK  operation  removes  a  child  record  from  a  liriksel  occurrence.  It 
is  assumed  that  the  blocks  of  files  Child  and  Parent  that  contain  the  child  and 
•  parent  records  c_rec  and  p_rec  are  in  main  memory  when  an  UNLINK  is  pro¬ 
cessed.  In  addition  to  updating  pointers  in  c__rec  and  p_rec,  it  may  also  be 
necessary  to  retrieve  and  update  pointers  of  other  child  records. 


If  Child  and  L  are  descriptors  of  Child  and  L,  the  estimated  cost  of  process¬ 
ing  an  UNLINK  is: 


UNLK{h)  = 


LPS  (l,  I.Avgw  ) 
0 


if  SI -1,  Dl-Q 
otherwise 


(4.4a) 


-t  UPD  {Child) 'X 


0  ifSl=0 

l-l/Avew  if  SI  =  1,  Dl=0 
A—2y'Avew  if  SI  =  1,  Dl  =  l 


(4.4b) 

(4.4c) 

(4.4d) 


In  order  to  update  a  singly  linked  list,  the  record  prior  to  c_rec  must  be 
located.  (4.4a)  estimates  the  cost  of  locating  this  record. 

(4.4b)-(4.4d)  estimate  the  cost  of  updating  one  or  more  child  records  other 
than  c_rec.  When  linksets  are  implemented  serially,  only  p_rec  is  updated,  so 
no  additional  costs  are  incurred  (4.4b).  (4.4c)  estimates  the  cost  of  updating  a 
list  pointer  of  the  record  prior  to  c_rec  on  a  singly  linked  list.  (4.4d)  estimates 

®  List-adjacent  records  of  c_rec  are  records  that  follow  and  precede  c_rec  on  a  de.signated 
list. 

No  costs  arc  attributable  to  modifying  the  parent  record  since  it  is  in  maun  memory.  Re¬ 
call  from  Section  4.2.2,  an  UPDATE  of  the  parent  record  is  not  a  peirt  of  a  LINK  operation. 
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the  cost  of  reading  and  updating  pointers  of  the  list-adjacent  records  of  c_rec 
on  a  doubl}'^  linked  list.  Factors  involving  Avew  in  (4.4c)  and  (4.4d)  account  for 
the  possibility  that  one  of  the  list-adjacent  records  of  c_rec  is  the  parent  record 
(see  footnote  7). 


4.4  Linkset  Storage  Requirements  and  Estimating  Record  Lengths 

It  is  not  unusual  for  pointers,  which  implement  linksets,  to  account  for  a 
considerable  portion  of  a  record's  length.  This,  in  turn,  indicates  that  a  linkset 
implementation  can  increase  the  storage  requirements  of  a  file  significantly.  As 
an  aid  for  estimating  record  lengths  and  linkset  storage  requirements,  a  func¬ 
tion  PTR  is  presented.  PTR  estimates  the  expected  number  of  pointers  stored  in 
each  record  of  a  specific  file  which  are  used  to  implement  a  specific  linkset.  The 
arguments  to  PTR  are  a  file  name  F  and  the  descriptor  L  of  linkset  L: 


PTR{F.  L) 


SI  Dl  +  Pp  if  F -Child 

{l  -k- Rl'><Dl)'x^{C,W,N)  if  F= Parent 
0  othemdse 


PTR(F,L)  can  be  understood  in  the  following  way.  If  F  is  the  child  file  of  L,  up  to 
three  pointers  are  stored  with  each  record  of  F.  Namely,  a  pointer  to  1)  the 
next  child  record,  2)  the  previous  child  record,  and  3)  the  parent  record.  The 
actual  number  of  pointers  is  determined  by  the  implementation  of  L.  Similarly, 
if  F  is  the  parent  file  of  L,  each  record  of  F  is  the  parent  record  of  a  linkset 
occurrence  with  an  average  of  ^(C,W,N)  occupied  cells.  There  is  at  least  one 
pointer  to  each  occupied  cell;  a  second  is  needed  if  doubly  linked  ring  lists  are 
used.  If  F  is  neither  the  peirent  or  child  file  of  L,  there  are  no  pointers  stored  in 
records  of  F  to  implement  I.. 

The  length  of  a  record  equals  the  total  length  of  all  fields  containing  an 
attribute  value  plus  the  total  length  of  all  pointers  stored  with  the  record: 
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length  of  a  record  of  file  F  = 

total  length  of  all  fields  containing  attribute  values  + 

(length  of  a  pointer)  x  (  ^  PTR{F,  L)  ) 

L  €  set  of  all  linksei  descriptors 

Once  the  length  of  the  records  of  a  file  is  known,  values  of  the  file’s  descriptor 
such  as  primary  and  overflow  block  capacities  can  be  estimated  easily  (see  Sec¬ 
tion  6.3  and  Appendix  IV). 
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CHAPTER  5.  A  MODEL  OF  TPJlNSACnONS 


A  transaction  is  a  procedure  that  performs  operations  on  a  database.  A 
model  of  transactions  involves  1)  describing  a  transaction  as  a  sequence  of  data¬ 
base  operations,  and  2)  using  this  representation  to  estimate  the  cost  of  pro¬ 
cessing  the  transaction. 

In  this  chapter,  a  model  of  transactions  is  presented.  We  begin  by  identify¬ 
ing  a  set  of  auxiliary  operations  that  are  performed  on  databases,  and  develop 
expressions  that  estimate  their  execution  cost.  We  then  introduce  a  notation  for 
modeling  transactions  and  present  rules  for  translating  a  transaction’s 
representation  into  a  cost  function.  Examples  are  given  to  illustrate  the  model. 


5.1  Auxiliary  Operations 

Basic  auxiliary  operations  common  to  database  processing  include  file  sort¬ 
ing,  projection,  creation,  erasure,  and  forming  the  join  of  two  files.  Like  simple 
file  and  linkset  operations,  these  operations  can  be  envisioned  as  procedures 
which  return  zero  or  more  records  as  their  output.  Functions  that  characterize 
these  operations  are  cost  functions  and  response  set  size  functions.  In  the  fol¬ 
lowing  paragraphs,  we  will  identify  five  operations  and  define  their  characteristic 
functions. 


Let  F-Query  and  G-Query  be  queries  over  the  simple  files  F  and  G.  A  general 
format  of  a  sort  operation  is; 


SORT  F 


(A,  ■  ■  -At) 


WHERE  F-Query 


OVER  sortkey 
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where  all  records  of  F  that  satisfy  F-Query  are  sorted  on  the  key  sortkey. 
Ai'  '  •  Aj.  are  attributes  v.'hose  values  are  extracted  from  each  record  of  the 
sorted  file.  Such  a  collection  of  values  forms  an  output  record,  and  the  collec¬ 
tion  of  all  output  records  forms  an  output  file.  Duplicate  output  records  are 
retained. 


A  projection  eliminates  attributes  from  records  of  a  file  and  removes  dupli¬ 
cate  records  resulting  in  the  process.  The  attributes  that  are  retained  are  pro¬ 
jection  attributes.  A  general  format  of  a  projection  operation  is: 


PROJECT  F 


(^1-  -  -A-) 


WHERE  F-Query 


where  all  records  of  F  that  satisfy  F-Query  are  projected  over  attributes 
A\'  '  •  Ajfe.  The  resulting  records  are  the  output  of  a  PROJECT. 

Let  A  be  an  attribute  of  file  F  and  A’  be  an  attribute  of  file  G.  An  equi-join  or 
natural  join  of  F  and  G  over  attributes  A  and  A’  is  denoted  by  F[A=A’]G  and  is 
defined  by: 


F\A'=A''\G  =  \  x.y  1  x  e and  y  e  C  and  a:[i4  ]=t/ [A’ ]  ] 


where  x,y  is  the  concatenation  of  records  x  and  y,  and  x[A]  is  the  value  of  attri¬ 
bute  A  for  record  x.  A  and  A'  are  said  to  be  join  attributes,  or  join  keys,  of  F  and 
G.  Informally,  an  equi-join  pairs  records  of  one  file  with  records  of  another  so 
that  both  records  of  a  pair  possess  a  common  value  for  designated  attributes.  ^ 

An  operation  which  forms  the  equi-join  of  two  files  over  their  cluster  keys  is 
a  cluster  key  join.  A  general  format  of  a  cluster  key  join  is: 


^  A  generalization  of  the  equi-join  is  the  S-join  [Codd72]: 

F[A%A'^G  =  [  x,y  \  x  e  F’  and  y  e  C  and  x[A  ]0y  [A’]  ^ 

where  0  is  the  relation  <,  =,  or  in  this  thesis,  we  shall  only  be  concerned  with 

equi-joins. 


no 


CK_JOIN  F 


(A  1 '  •  •  ) 


WHERE  F-QueryJ  WITH 


(A'i-'A'm) 


WHERE  G-Query 


OVER  join-clause 


where  the  records  of  F  that  satisfy  F-query  are  equi-joined  with  records  of  G  that 
satisfy  G-query.  The  joining  condition  is  specified  by  join-clause,  which  takes  the 
form  (cluster  key  of  F  =  cluster  key  of  G).  For  each  pair  of  records  that  are 
joined,  values  of  attributes  Ai’  •  •  Afg  are  extracted  from  the  record  of  F  and 
values  of  attributes  i  •  •  •  A'm  are  extracted  from  the  record  of  G.  This  collec¬ 
tion  of  values  forms  an  output  record,  and  the  collection  of  all  output  records 
forms  an  output  file.  Duplicate  output  records  are  retained. 

It  is  often  the  case  that  files  are  joined  over  attributes  that  are  not  cluster 
keys.  In  such  cases,  joins  can  be  realized  by  executing  sequences  of  simple  file, 
linkset,  and  auxiliary  operations  ([Got75],  [BlEs77],  [Yao79]).  Examples  of  such 
joins  are  given  in  Section  5.4. 

A  CREATE  operation  organizes  a  collection  of  records  into  a  simple  file 
structure.  A  general  format  of  a  CREATE  is: 


CREATE  G  FROM  F 


(A  1  •  ’  '  A  i.) 


WHERE  F-Query 


where  G  is  the  file  to  be  created,  and  the  values  of  attributes  Ai  -  •  -Aic  are 
extracted  from  all  records  of  F  that  satisfy  F-Query.  The  values  extracted  from 
a  record  of  F  forms  a  record  of  G.  Duplicate  records  are  retained.  A  CREATE 
returns  no  records  as  output. 

A  CREATE  operation  stores  records  in  G  in  the  order  that  they  are  retrieved 
from  F  (i.e.,  in  cluster  key  order).  Thus,  the  record  sequencing  of  F  and  G  will 
be  identical.  This  need  not  infer  that  F  and  G  have  identical  cluster  key  types. 
For  example,  G  could  be  a  hash  based  file  (ie..  Ck=hash  key)  and  F  could  be  a  Bi- 
tree  (ie.,  Ck=logical  valued  key).  This  is  possible  if  the  cluster  key  of  each 
record  of  F  is  assigned  a  value  equal  to  the  record’s  hash  key.  In  this  way. 
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records  of  F  are  sequenced  on  ascending  hash  keys. 


An  ERASE  operation  eliminates  a  file  from  a  database,  A  general  format  of 
an  ERASE  is; 

ERASE  F 

where  F  is  the  simple  file  that  is  to  be  eliminated.  It  is  assumed  that  all  records 
of  F  are  not  linked  to  another  simple  file.  No  records  are  output  by  an  ERASE. 

In  the  above  formats,  square  brackets  enclose  phrases  that  need  not  be 
specified.  Omitting  "(A  i  •  •  •  A^-)”  implies  all  attribute  values  are  desired;  omit¬ 
ting  "WHERE  F-Query"  implies  all  records  of  F  participate  in  the  operation. 

Cost  functions  and  response  set  size  functions  accept  arguments  that  may 
include  the  descriptors  F  and  G  of  F  and  G,  and  FQ  and  GQ  of  F-Query  and  G- 
Query.  Other  arguments  may  include  values  assigned  to  those  parameters 
whose  definitions  are  listed  in  Table  5.1.  Methods  of  estimating  these  values  are 
proposed  in  the  following  sections. 


As  cost  of  transferring  the  contents  of  an  internal  bufler  to  and 
from  secondary  storage 

Ib  the  number  of  internal  buffers  allocated  to  a  PROJECT  or  SORT 
Be  record  capacity  of  an  internal  buffer 

Ps  projection  selectivity:  the  fraction  of  records  of  a  file  that 
remain  after  duplicate  records  are  removed 

GtoF  the  average  number  of  records  of  G  that  arc  joined  to  a  record  of  F 


Table  5. 1  A  list  of  Auxiliary  Parameters 

Table  5.2  lists  the  above  auxiliary  operations  with  their  characteristic  func¬ 
tions.  From  the  above  descriptions  of  the  CREATE  and  ERASE  operations,  we 
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know  that  nCRE=0  and  nERAS=0.  Expressions  that  estimate  other  characteris¬ 
tic  functions  are  developed  in  the  following  sections. 


auxiliary  operation 


cost 

function 


response  set  size 
function 


SORT  F 


WHERE  F-Query 


OVER  sortkey 


PROJECT  F 


CK_JO]N  F  {A  ^  -  Aj.) 

WITH  G  •  -  Am) 

OVER  join-clause 


CREATE  G  FROM  F 
WHERE  F-Query 

ERASE  F 


WHERE  F-query 

WHERE  F-Query 
WHERE  G-Query 


(A  j  •  ■  A 


SORT(F.FQ. 

Ib.Bc.As) 

PROJ(F.  FQ, 
Ib,Bc,As) 


nSORT(F,  FQ) 

nPROJ(F,FQ. 

Ps) 


J01N(F,FQ.  nJOIN(F.  FQ. 
G,GQ)  G.GQ.GtoF) 

CRE(G,F.FQ)  nCRE 


ERAS 


nERAS 


Table  5.2  Characteristic  Functions  of  Auxiliairy  Operations 

5.2  Cost  Expressions  for  Auxiliary  Operations 

A  convenient  notation  regarding  the  use  of  descriptors  is  to  superscript  a 
primitive  or  derived  quantity  with  the  descriptor  from  which  the  value  was 
obtained.  So  is  the  expected  number  of  child  records  per  linkset 

occurrence  for  a  linkset  with  descriptor  L,  and  is  the  number  of  base  file 
nodes  of  a  file  with  descriptor  F.  This  notation  will  be  iised  in  the  following  sec¬ 
tions. 

5.2.1  SORT 

A  SORT  begins  with  the  retrieval  of  all  records  that  satisfy  F-query.  Each 
selected  record  is  trimmed  of  all  attribute  values  that  are  not  needed  for  output 
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or  for  sorting.  The  number  of  internal  buffers  allocated  to  a  SORT  is  Ib.  Each 
buffer  contains  a  block  -which  can  accommodate  Be  trimmed  records. 

Records  are  sorted  using  an  (Ib-l)--way  merge-sort  algoritlim.  “  A  run  is  a 
sequence  of  sorted  records  stored  in  one  or  more  blocks.  With  the  aid  of  an 
internal  sorting  algorithm,  initial  runs  arc  created  to  be  of  length  Ib  blocks. 

(abbr.  Nxf)  estimates  the  number  of  records  that  satisfy  F-query. 


The  initial  number  of  runs  is  therefore 


Nxf 
lb  xBc 


The  number  of  runs  is  reduced  by  a  factor  of  Ib-1  by  collecting  runs  into 
groups  of  Ib-1  and  merging  the  runs  of  each  group  into  a  single  run.  This  task  is 
called  a  merge  pass.  A  SORT  terminates  when  a  single  run  is  produced.  The 


number  of  merge  passes  to  achieve  termination  is 


LOC 


Nxf 


Ib  xBc 


Setting  As 


to  be  the  cost  of  transferring  the  contents  of  an  internal  buffer  to  and  from 
secondary  storage,  the  cost  of  a  SORT  is  approximately: 


SORT (F,  FQ, 76, 5c, As) 


RET{¥,  FQ)  -t  2x 


N  X/ 


Be 


xAs  X  LOC  /j,_i 


77  X/ 

Ib  xBc 

The  expected  number  of  records  that  are  output  by  a  SORT  is: 

nSORFCF,  FQ)  =  Nxf 


5.2.2  PROJECT 

A  projection  may  be  accomplished,  in  part,  by  a  SORT  where  the  sort  key  is 
defined  by  the  projection  attributes.  In  a  sorted  file,  duplicate  records  appear 
in  consecutive  record  positions.  Consequently,  to  detect  and  remove  duplicate 
records  is  a  simple  process.  A  cost  estimate  of  a  PROJECT  operation  based  on  a 
SORT  is: 

^  I— 

Other  sorting  algorithms  could  have  been  used.  A  merge-sort  was  chosen  because  it  is  a 
commion  method  aoid  it  is  easy  to  anal5^e. 
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PROJ(F.Fq,Ib.Bc,As)  =  SORriF.Fq.Ib,Bc,As) 


where  all  arguments  of  PROJ  have  definitions  that  are  identical  to  their  SORT 
counterparts.  It  is  worth  noting  that  if  there  is  an  identifier  among  the  projec¬ 
tion  attributes,  then  no  duplicate  records  will  be  formed  by  attribute  elimina¬ 
tion,  In  such  cases,  a  SORT  is  not  required  and  the  cost  of  a  PROJECT  is  zero. 

The  expected  number  of  records  that  are  sorted  by  a  PROJECT  is  A'x/. 
Once  duplicate  records  arc  removed,  only  a  frael.ion  Ps  of  these  records  remain. 

The  number  of  records  that  are  output  by  a  PROJECT  is; 

7iPR0J{F.  FQ.Ps)  =  PsxNxf 


An  upper  bound  to  Ps  may  be  determined  in  the  following  way.  Let  Ai  be  an 
output  attribute  of  a  PROJECT,  and  let  be  the  number  of  distinct  values 
assumed  by  A*  in  F.  (Note  the  selectivity  of  (A^=value)  is  l/v^).  The  maximum 

k 

number  of  distinct  records  a  PROJECT  could  return  is  min(jV  x/,  Thus,  an 

i=l 

upper  bound  to  the  fraction  of  Nxf  records  that  remain  after  duplicate  records 
are  removed  is: 


k 

Ps  ^  mini  1.  Ylvi/iN^f )  ) 

1=1 


5.2.3  CK_JOIN 

A  RETRIEVE  Operation  outputs  records  in  cluster  key  order.  By  con¬ 
currently  retrieving  records  of  F  and  and  pairing  those  records  that  satisfy 


the  join-cIriU.'JR,  a  CK _ JOIN 


of  P  and  G  is  reali7;ed. 


The  estimated  cost  of  process¬ 


ing  a  C}C_JOIN  operation  is; 


</0/A'(F,  FQ,  G.  GQ)  =  RET (F.  Fq)  ^  RBT (G,  GQ) 


Let  N^^^xGtoF  he  the  number  of  records  that  would  be  output  if  all  records 
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of  F  and  G  "were  eligible  to  participate  in  a  CK_JOIN.  GtoF  is  the  average  number 
of  records  of  G  that  are  joined  to  a  record  of  F.  If  only  a  fraction  of  the 

records  of  F  were  eligible  to  participate,  the  expected  size  of  the  output  file 
would  be  Furthermore,  if  only  a  fraction  of  the  records  of 

G  were  eligible  to  participate,  the  expected  size  of  the  output  file  of  a  CK_JOIN 
would  be: 

nJO/Af(F.FQ.  G.  GQ.CfoF)  =  x/^^Q) 

5.2.4  CREATE 

From  the  definition  of  the  CREATE  operation,  the  expected  number  of 
records  of  F  that  satisfy  F-Query  equals  the  number  of  records  in  G: 

It  is  assumed  that  the  above  relationship  is  observed  when  values  of  G’s  descrip¬ 
tor  are  being  estimated.  ^ 

Let  CON(G)  be  the  cost  of  constructing  file  G  (ie.,  the  cost  of  writing  all 
blocks  containing  cluster  index  and  base  file  records  of  G): 

L-l  xG 

CON{G)  =  X]  ^  ^  L 

1=0 

It  follows  that  the  estimated  cost  of  a  CREATE  operation  is: 

CR£’(G,  F.  FQ)  =  RET (F.  Fq)  +  CON (G) 

A  CREATE  outputs  no  records: 

nCRE  ~  0 


^  It  is  to  be  understood  that  for  our  cost  estimate  model,  CREATE  does  not  supply  values  to 
descriptor  G ;  rather,  the  values  of  G  have  been  previously  estimated  so  that  they  m.av  be 
used  as  arg\iments  to  cost  functions. 
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5.2.5  ERASE 


File  erasure  is  mainly  a  task  of  the  underlying  operating  system  of  the  data¬ 
base.  As  sueh,  costs  attributable  to  file  erasures  cannot  be  directly  analyzed  as 
other  operations.  So  far  we  have  assumed  negligible  costs  for  releasing  and 
acquiring  blocks  from  a  pool  of  available  blocks.  Therefore,  as  a  first  approxi¬ 
mation,  the  cost  of  erasing  a  file  is  negligible: 

ERAS  =  0 

There  are  no  records  output  by  an  ERASE: 

tiERAS  —  0 

5.3  A  Notation  for  Transactions 

In  the  following  paragraphs,  a  notation  for  modeling  transactions  is 
presented.  The  purpose  of  introducing  the  notation  is  to  show  relationships 
between  transactions  and  database  operations  (i.e.,  simple  file,  linksst,  and  auxi¬ 
liary  operations),  and  to  serve  as  an  aid  in  the  process  of  developing  cost  func¬ 
tions  for  transactions.  Because  the  notation  is  based  on  Pidgin  ALGOL 
([AHU74]),  formal  declarations  involving  data  types  are  avoided.  It  is  to  be 
understood  that  we  are  not  proposing  this  notation  as  a  language  for  transac¬ 
tions. 

Transactions  are  procedures  that  perform  operations  on  a  database  and 
that  output  zero  or  more  records.  Records  that  are  output  are  usually  used  as 
arguments  in  subsequent  calls  of  other  transactions.  Simple  file,  linkset,  and 
auxiliary  operations  are  examples  of  transactions.  In  our  notation,  transactions 
can  be  defined  and  subsequently  invoked.  A  transaction  T  can  be  defined  b)'-  a 
statement  of  the  form: 

TRANSACTION  T{ para-rn^LisL  )  \  S( pararn^Lisi  )  \  (Fl) 
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param_list  is  the  sequence  of  formal  parameters  of  T.  By  S(param _ list)  we 

mean  that  S  is  a  sequence  of  executable  statements  that  involve  the  parameters 
of  param_list.  During  an  execution  of  T,  records  may  be  output,  for  each 
record  r  that  is  to  be  output,  the  statement 

OUTPUT(  r  ) 


must  be  executed.  There  can  be  any  number  of  OUTPUT  statements  among  the 
executable  statements  of  T.  An  execution  of  T  terminates  whenever  a  RETURN  is 
encountered  or  upon  completion  of  execution  of  the  last  statement  of  T. 

To  facilitate  the  manipulation  of  records  by  transactions,  record  variables 
are  introduced.  Just  as  an  integer  variable  assumes  an  integer  as  its  value,  a 
record  variable  assumes  a  record  as  its  value.  Some  useful  operations  associ¬ 
ated  with  record  variables  are  attribute  value  extraction  and  record  concatena¬ 
tion.  If  X  and  y  are  record  variables,  x[A]  denotes  the  value  of  attribute  A  in 
record  x,  and  x,y  denotes  the  concatenation  of  records  x  and  y. 

A  transaction  is  invoked  by  using  its  name  with  the  desired  arguments.  We 
denote  a  call  of  transaction  T  by  T(arg_list).  Statements  involving  calls  of  tran¬ 
sactions  assume  a  number  of  different  formats,  called  statement  forms,  each 
reflecting  the  number  of  records  that  are  output  by  a  transaction.  A  statement 
form  for  calling  a  transaction  which  outputs  no  records  is: 

T{  arg_List  )  (F2) 


A  statement  form  for  calling  a  transaction  that  outputs  precisely  one 
record  is: 


X  :=  T{  arg_J,ist  ) 


(F3) 


^  Note  that  we  have  adopted  a  more  verbal  notation  for  displajdng  arguments  of  database 
operations  than  the  more  concise  notation  of  (FI).  TVe  have  done  so  only  for  expositoi'y 


reasoit*.  In  an  actual  language,  operations  like 
‘'REMOVE(x,F)". 


^ROm  F"  v.'ould  appear  as 
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(F3)  means  that  the  output  record  of  T  is  assigned  to  record  variable  x. 

A  statement  form  for  calling  a  transaction  that  outputs  zero  or  more 
records  is: 

FOR  X  ;=  T(aTg^list  )  DO  |  iSfxj  ^  (F4) 

S(x)  denotes  a  sequence  of  executable  statements  involving  record  variable  x. 

(F4)  means  that  each  time  a  record  is  output  by  T.  it  is  assigned  to  x  and  S(x)  is 
executed.  An  execution  of  a  statement  of  form  (F4)  terminates  when  T(arg_iist) 
terminates. 

To  illustrate  some  of  the  above  ideas,  consider  the  transaction 
CROSS_PROBUCT(F,G)  which  returns  the  cross  product  of  files  F  and  G: 

TRANSACTION  CROSS_PRODUCT(  F,G  ) 

\  FOR  X  :=  RETRIEVE  F  DO 

\  FOR  y:=  RETRIEVE  G  DO 
\  OUTPUT(x.y); 

1: 

i 

The  outer  FOR  loop  reads  F  one  record  at  a  time.  For  each  record  in  F,  the 
inner  FOR  loop  reads  G  one  record  at  a  time.  OlJTPUT(x,y)  outputs  a  pair  of 
records  belonging  to  the  cross  product  of  F  and  G. 

Whenever  a  transaction  outputs  a  record  x,  x  is  a  copy  of  all  or  a  part  of 
some  record  x'  of  some  file.  In  order  to  effect  modifications  of  x’  or  a  removal  of 
x’,  a  correspondence  between  x  and  x’  must  be  established.  Such  a  correspon¬ 
dence  may  be  realized  using  record  variables.  ^ 

^  In  CODASYL  databases,  such  a  correspondence  is  realized  by  currency  indicators 
([CODA70]). 
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Each  record  variable  is  assigned  a  finite  amount  of  storage,  not  elII  of  which 
is  visible  to  a  user.  The  portion  that  is  visible  contains  the  attribute  values  of 
the  output  record  x.  The  portion  that  is  hidden  contains  a  physical  address 
pointer  to  record  x’.  In  this  way,  a  correspondence  between  x  and  x’  is  made. 

The  database  operations  that  require  this  correspondence  are  PIEMOVE, 
UPDATE,  LINK,  and  UNLINK.  A  way  of  optimizing  these  operations  is  to  have  the 
block  that  contains  x’  in  main  memory  when  x’  is  UPDATEd  or  REMOVEd,  thereby 
eliminating  the  need  for  reaccessing  the  block.  (It  was  previously  accessed  when 
output  record  x  was  formed.)  The  purpose  of  the  HOLD  option  in  RETRIEVE, 
INSERT,  DELETE.  RETRIEVE_CKILD,  and  RETRIEVE_P.\RENT  is  to  specify  that 
such  optimization  is  to  take  place.  Namely,  the  block  containing  x'  is  to  remain 
in  main  memory  until  an  UPDATE  x  or  REMOVE  x  is  issued,  or  until  x  is  assigned 
a  new  output  record. 

User  defined  transactions  can  output  records  in  HOLD  mode  only  if  the 
database  operations  that  retrieved  those  records  were  executed  in  HOLD  mode. 
To  facilitate  this  correspondence,  the  variable  HOLD_phrase  is  supplied  as  an 
explicit  argument  to  a  transaction.  Its  value  indicates  whether  the  records 
returned  by  a  transaction  are  to  be  output  in  HOLD  mode.  For  example,  if 
L_Parent  was  defined  as; 

TRANSACTION  L_Parent(  child _ rec,  HOLD_phrase  ) 

\  L_Parent  :=  RETRIEVE_PARENT  child_rec  VIA  L  HOLD_phrase; 


then: 

p  :=  L_Parent(c,HOLD): 
is  equivalent  to: 


120 


p  :=  RETRIEVE_PARENT  c  VIA  L  HOLD; 


Occasionally  it  is  necessary  to  use  the  output  file  of  a  transaction  as  an 
input  file  to  a  second.  One  way  that  this  can  be  accomplished  is  to  collect  out¬ 
put  records  in  a  temporary  file  and  to  use  that  temporary  file  as  an  argument  in 
subsequent  calls  of  transactions.  A  statement  form  that  enables  this  to  be  done 
is; 

X  <-  T{arg__list)  (P5) 

(F5)  means  that  file  X  is  created  to  contain  the  output  file  of  T(arg_list).  Note 
that  the  file  assignment  operator  *-  is  different  from  the  record  assignment 
operator ;=. 

An  alternative  method,  called  piping  ([KiTh74]),  avoids  the  use  and  costs 
associated  with  temporary  files.  The  idea  is  to  identify  the  output  file  of  one 
transaction  T  with  the  input  file  of  another  transaction  T’.  A  pipe  from  T  to  T’ 
utilizes  an  internal  buffer  which  can  hold  one  record.  Each  time  a  record  is  out¬ 
put  by  T,  it  is  stored  in  this  buffer.  "When  T’  reads  the  record,  the  buffer  is 
flushed  so  that  the  next  output  record  of  T  can  be  accepted.  It  is  intuitively 
clear  that  a  pipe  from  T  to  T’  is  possible  only  if  T’  processes  the  records  of  its 
piped  file  sequentially  and  that  the  piped  file  is  read  only  once.  For  example,  file 
F  of  CR0S5_PR0DUCT  can  be  a  piped  file;  file  G  cannot  since  G  is  read  many 
times. 

Let  <  T(arg_list)  >  denote  the  output  file  of  T(arg_list).  A  notation  that 
indicates  that  the  output  file  of  T(arg_list)  is  the  file  argument  X  of  transaction 
T’(X)  is: 

r  {  <  T {arg_lisi)  >  )  (F6) 

When  this  notation  is  used,  we  will  say  the  output  of  T  is  piped  into  T’.  Thus, 
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"SORT  <  T(arg _ list)  >  OVER  sortkey"  means  SORT  the  output  file  of  T(arg_list) 

on  ascending  sortkey  values.  ^  As  another  example,  a  statement  of  the  form  (F5) 
is  equivalent  to: 

CREATE  X  FROM  <  T(arg_list)  > 

Table  5.3  summarizes  the  statement  forms  and  notation  introduced  in  this 
section.  Although  other  statement  forms  arc  possible,  it  is  believed  that  this  set 
is  sufficient  to  represent  most  transactions.  Examples  are  given  in  Section  5.4 
to  illustrate  the  generality  of  this  representation.  Limitations  of  the  representa¬ 
tion  are  discussed  in  Section  7.2. 


form 

statement  or  notation 

(Fl) 

TRANSACTION  T(  ■pararrL^liiii  )  |  S{  param._List  )  ] 

(FS) 

T  (  arg_lisi  ) 

(F3) 

X  ;=  T{  arg_list  ) 

(F4) 

FOR  X  T{  arg_list  )  DO  f  ^ 

(F5) 

X  <-  T  {  arg_list  ) 

(F6) 

T’  {  <  T{  arg_lisi  )  >  ) 

Table  5.3  A  Catalog  of  Statement  Forms  and  Notation 

5.3.1  Cost  Function  Composition  Rules 

A  transaction  can  be  described  by  two  functions;  a  cost  function  and  a 

response  set  size  function.  The  cost  function  estimates  the  cost  of  executing 

the  transaction:  the  response  set  size  function  estimates  the  number  of  records 

that  the  transaction  outputs.  Quite  often  these  functions  are  complex  and 
^  The  end  of  the  output  file  of  T(arg_list)  is  reached  when  T(arg_list)  tem'iinates. 
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difficult  to  develop.  Id  the  following  paragraphs,  an  aid  to  developing  cost  func¬ 
tions  is  proposed.  The  aid  is  a  set  of  rules  which  can  be  used  to  compose  a  cost 
function  from  a  model  of  a  transaction.  In  the  next  section,  an  additional  set  of 
rules  is  proposed  for  estimating  response  set  size  functions. 

A  transaction  T  is  modeled  by  a  sequence  (sj;  •  •  •  ;  s*)  of  executable  state¬ 
ments.  Let  ${si)  be  the  cost  of  executing  statement  Si.  Assuming  the  state¬ 
ments  of  a  transaction  are  processed  in  subscript  order,  the  cost  of  processing 
T,  denoted  $(T),  is  given  by  the  rule: 

${T)  =  £  S(si)  (Rl) 

1  =  1 

For  modeling  simplicity,  all  statements  that  do  not  involve  calls  of  transactions 
«  are  assumed  to  have  negligible  cost.  Those  statements  that  do  involve  calls  of 
transactions,  namely  those  of  forms  (F2)-(F5),  are  therefore  of  primary  concern. 

Let  t(Fi)  be  the  cost  of  processing  a  statement  of  the  form  (/’i).  Rules  for 
estimating  the  costs  of  processing  statements  of  forms  (F2)  and  (F3)  are: 

${F2)  =  ${T{arg„list))  (R2) 

${F3)  =  ${T{arg__list))  (R3) 

That  is.  the  cost  of  processing  any  statement  of  the  form  (F2)  and  (F3)  is  simply 
the  cost  of  processing  T(arg_list). 

Let  n(T(arg_list))  be  the  expected  number  of  records  that  are  output  by 
T(arg_list).  A  rule  for  estimating  the  cost  of  processing  a  statement  of  form 
(F4)  is: 

${F4:)  =  $  {T{arg_list))  +  7i{T(arg_List))  y:  $  (S  (x))  (R4) 

(R4)  accounts  for  the  cost  of  executing  T(arg_list)  and  the  cost  of  processing 
S(x)  an  average  of  n(T(arg_list))  times. 
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To  illustrate  the  above  rules,  recall  the  CROSS_PRODUCT  transaction  of  the 
previous  section.  The  cost  function  of  CROSS_PRODUCT  is: 

$(CROSS_PRODUCT(F,G))  = 

\  RET(F.ALL)  +  nRET(F.ALL)x 

I  RET(G.ALL)  +  nRET(G,ALL)x 

1  0  i 

=  RET(F.ALL)  +  nRET(F,  ALL)xRET(G,  ALL) 

=  RET(F,ALL)  +  A^^^xRET(G.  ALL) 

■where  F  and  G  are  taken  to  be  the  descriptors  of  F  and  G.  and  ALL  is  the 
descriptor  of  a  query  that  qualifies  all  records  of  a  file: 

query 

descriptor  f  ef  kf  Ss 

ALL  1  1  1  scan 

As  expected,  the  cost  function  of  CROSS_PRODUCT  equals  the  cost  of  reading  file 
F  plus  the  cost  of  reading  file  G  once  for  each  record  of  F. 

If  X  is  the  descriptor  of  X  in  (F5),  a  rule  for  estimating  the  cost  of  process¬ 
ing  a  statement  of  form  (Ffi)  is: 

^(F5)  =  ${T{arg_list))  +  CON{X)  (R5) 

where  CON(X)  estimates  the  cost  of  constructing  file  X.  As  discussed  in  Section 
5.2.4,  the  following  relationship  holds  between  the  number  of  records  in  file  X 
and  n(T(arg_list)): 

_  ri{T(arg_list)) 
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Although  piping  records  of  one  transaction  into  another  does  not  involve  the 
use  of  temporary  files,  estimating  the  cost  of  a  pipe  involves  the  use  of  (tem¬ 
porary)  internal  files.  An  internal  file  is  a  file  whose  records  are  stored  in  main 
memory.  Operations  on  internal  files  are  assumed  to  have  negligible  cost.  Rules 
for  estimating  the  cost  of  piping  are: 

(R6) 

1)  For  each  occurrence  of  T’(  <T(arg_list)>  ),  introduce  the  statement 
I-^T(arg_list)  immediately  before  the  statement  containing 
T’(  <T(arg_list)>  ),  and  replace  T’(  <T(arg_list)>  )  with  T’(I).  I  is  an 
internal  file. 

2)  Appl)?-  rules  (R1)-(R5). 

The  idea  that  underlies  the  above  rules  is  that  the  cost  of  executing  a  pipe  is 
equivalent  to  the  cost  of  using  temporary  files  as  long  as  operations  on  tem¬ 
porary  files  have  negligible  cost. 

To  illustrate  the  above  rules,  recall  that  a  statement  of  the  form  (F5)  may 
be  written  as 

CREATE  X  FROM  <  T(arg_list)  >; 

Applying  rule  1  of  {R6),  we  obtain: 

I  •«-  T(arg_list): 

CREATE  X  FROM  I; 

and  by  rule  2  of  (R6): 

$(F5)  =  $(T(arg_list))  -f-  CON(I)  +  CRE(X.I,ALL) 

=  $(T(arg_list))  +  CON(I)  -t  CON{X)  +  RET(I,  ALL) 
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Since  costs  associated  with  internal  files  are  negligible,  the  CON(I)  and 
RET(I,  AJiL)  terms  vanish  and  (R5)  follows. 

5.3.2  Response  Set  Size  Function  Composition  Rules 

Agedn,  a  transaction  T  is  modeled  by  a  sequence  (sj;  •  •  •  ;  Sjt)  of  executable 
statements.  Let  n(si)  be  the  number  of  times  an  OUTPUT  statement  is  executed 
during  an  execution  of  statement  Assuming  the  statements  of  a  transaction 
are  processed  in  subscript  order,  the  number  of  records  output  by  T,  denoted 
n(T),  is  given  by  the  rule; 

1=1 

To  estimate  n(T),  note  that  only  OUTPUT  statements  and  statements  involving 
calls  of  transactions,  namely  those  of  forms  (F2)-(F5),  need  be  considered.  All 
other  statements  do  not  involve  OUTPUTS. 

Clearly  for  any  record  r, 

n(OUTPUT{r))  =  1  (RB) 

Let  n{Fi)  be  the  number  of  times  an  OUTPUT  is  executed  by  a  statement  of 
the  form  (F’i).  Because  statements  of  forms  (F2),  (F3),  and  (F5)  do  not  involve 
OUTPUT  statements,  the  following  rules  are  evident: 

rL{F2)  =  n(y^3)  =  n(A’5)  =  0  (R9) 

A  rule  for  estimating  the  number  of  times  an  OUTPUT  is  executed  bj^  a 
statement  of  the  form  (F4)  is: 

n{FA)  =  n{T{arg^list))  X.  n{S  (x))  (RIO) 

(RIO)  accounts  for  executing  S(x),  which  outputs  n(S(x))  records,  an  average  of 
n(T(arg_list))  times. 
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To  illustrate  the  above  rules,  recall  the  CROSS_PRODUCT  transaction  of  sec¬ 


tion  5.3.  The  response  sot  size  function  of  CROSS_PRODUCT  is: 

n(CR03S_PR0DUCT(F,G))  = 

I  nRET(F.  ALL)  x 

I  nRET(G.ALL)x 

1  1  i 

=  nRET(F,  ALL)  X  nRET(G,  ALL)  = 

where  again,  F  and  G  are  taken  to  be  the  descriptors  of  F  and  G.  and  ALL  is  the 
descriptor  of  a  query  that  retrieves  all  records  of  a  file.  As  expected,  the 
number  of  records  in  a  cross  product  equals  the  product  of  the  file  size':. 

Rules  for  estimating  response  set  size  functions  when  records  are  piped  are 
identical  to  those  for  estimating  cost  functions,  except  that  rules  (R7)-(R10)  are 
used.  Again,  the  idea  is  to  express  pipes  in  terms  of  operations  on  temporary 
(internal)  files.  For  example,  a  statement  of  the  form  SORT_EXAMPLE: 

FOR  X  :=  SORT  <  T(arg_list)  >  OVER  sortkey  DO  \  OUTPUT(x)  ]: 


translates  to: 

I  <-  T(arg_list): 

FOR  X  :=  SORT  I  OVER  sortkey  DO  \  OUTPUT(x)  ]; 

Applying  rules  (R7)-(R10),  we  find  the  number  of  times  OUTPUT  is  executed, 
denoted  n(SORT_EXAMPLE),  to  be: 
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n(SORT_EXAMPLE)  =  0  +  nSORT(I.  ALL)  x  [  l  ^ 

=  n(T(arg_list)) 

As  expected,  the  number  of  times  OUTPUT  is  executed  equals  the  num.ber  of 
records  output  by  T(arg_list). 

5.4  Examples 

Physical  databases  can  be  described  by  directed  graphs  with  labeled  arcs 
and  vertices.  Vertices  that  are  represented  by  rectangles  denote  data  files; 
those  that  are  represented  by  triangles  denote  index  files.  A  linksst  is 
represented  by  an  arc  from  the  parent  file  to  the  child  file;  boldfaced  arcs  imply 
child-to“parent  connections  are  maintained.  The  labels  of  arcs  and  vertices  are 
the  names  of  the  corresponding  linksets  and  simple  files.  Graphs  constructed  in 
this  manner  are  database  graphs. 

A  database  graph  of  a  university  database  is  shown  in  Figure  5.1.  The  data¬ 
base  is  used  by  university  officials  to  store  course  grades  of  students  and  to 
maintain  class  listings  of  current  course  enrollments.  It  consists  of  six  simple 
files  and  four  linksets.  Four  of  the  simple  files  are  data  files  (Instructor,  Stu¬ 
dent,  Grade,  and  Course),  and  the  remaining  are  index  files.  Note  that  the 
Instructor  file  is  not  connected  to  other  simple  files  via  linksets.  Also,  note  that 
only  linksets  Course_Grades  and  Student_Grades  maintain  child-to-parent  con¬ 
nections. 

Just  as  attribute  values  are  stored  v-dthin  speeific  fields  of  a  record,  so  too 
eire  cell  directories.  Once  names  have  been  assigned  to  the  fields  of  a  record, 
both  attribute  values  and  cell  directories  of  a  record  can  be  addressed  in  the 
same  way.  For  example,  if  List_SG  is  the  name  of  a  field  containing  a  cell  direc¬ 
tory  and  s  is  a  record,  then  s[List_SG]  is  the  List_SG  cell  director}'^  of  s. 

The  cluster  key  and  field  names  of  attributes  and  cell  directories  for  each 


128 


129 


simple  file  of  the  university  database  are  listed  in  Table  5.4.  Note  that  Grade  is 
an  unordered  file  and  Course  is  a  hash  based  file  v.'hose  hash  key  is  derived  from 
C#  values. 

Cell 


File 

Cluster  Kev 

Attribute  Names 

Directory  Name 

Student 

s# 

Sff,  Student_Name,  Info 

List_SG 

Grade 

relative  location 

Course_Grade 

Course 

hash(C^) 

Course_Name,  Room _ Held 

Time_Held,  Prof_Name 

List_CG 

Instructor 

Prof_Name 

Prof_Name,  Phone#,  Office#, 
Building,  Home _ Address 

Cname 

Course_Name 

Course_Nam.e 

List_CC 

Pname 

Prof_Name 

Prof_Name 

List_PC 

Table  5.4  Field  Names  of  the  University  Database 

Following  the  convention  of  previous  chapters,  the  descriptor  of  a  linkset  L 
and  a  simple  file  F  will  be  designated  in  a  boldfaced  font  as  L  and  F.  Descriptors 
of  other  quantities  specific  to  a  transaction  are  defined  when  needed. 

In  the  following  pages,  a  number  of  transactions  involving  the  university 
database  are  presented.  The  processing  strategies  used  in  each  transaction  are 
not  always  optimal;  in  many  situations,  the  least  costly  approach  to  processing 
depends  on  the  data  and  implementation  of  the  database.  The  purpose  of  these 
examples  is  to  demonstrate  that  the  transaction  model  is  a  valuable  tool  for 
describing  a  wide  variety  of  processing  strategies.  Choosing  the  least  costly 
strategy  is  a  problem  which  is  not  addressed  in  this  thesis. 

Example  1.  (Processing  queries  using  indices.)  Figure  5.2  shows  a  transac- 
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tion  Tl(x,y)  which  prints  the  times  and  locations  of  course  x  taught  by  instruc¬ 
tor  y.  The  processing  strategy  is  to  access  the  cell  directories  that  correspond 

to  the  clauses  (Course_Name=x)  and  (Prof _ Name=y)  from  the  index  files  Cname 

and  Pname.  Intersecting  these  cell  directories  identifies  the  course  records 
that  are  to  be  accessed.  ^  The  "print"  operation  in  T1  displays  the  times  and 
locations  of  the  desired  courses  on  the  user’s  terminal.  Print  operations  are 
assumed  to  have  negligible  execution  cost. 

In  addition  to  the  descriptors  of  Cname,  Pname,  and  CC,  descriptors  for  the 
queries  "Course_Name=x”  and  "ProL_Name=y’',  and  the  cell  directory  of  the 
intersected  lists  are  needed  to  evaluate  $(Tl(x,y)).  Since  T1  outputs  no  records, 
n(Tl(x,y))  =  0. 

Example  2.  (File  Navigation.)  Figure  5.3  shows  a  transaction 
T2(x,H0LD_phrase)  which  outputs  records  of  students  who  have  taken  course  x. 
Optionally,  these  records  can  be  output  in  HOLD  mode.  The  processing  strategy 
is  to  locate  all  instances  of  course  x  via  the  Cname  index.  For  each  course 
record,  all  associated  grade  records  are  retrieved  via  the  Course_Grades  link- 
set.  For  each  grade  record,  the  corresponding  student  record  is  retrieved  via 
the  Student_Grades  linkset.  Note  that  the  student  records  are  output  in  ran¬ 
dom  order  (see  Example  5).  Also  note  that  T2(x,H0LD_phrase)  contains  nested 
statements  of  the  form  (F4),  and  that  this  nesting  is  reflected  in  the  expressions 
characterizing  $(T2(x,H0LD_phrase))  and  n(T2(x,H0LD_phreLse)).  The  number 
of  records  output  by  TE  is  estimated  by  the  number  of  times  course  x  was 
offered  multiplied  by  the  average  number  of  grades  issued  per  course. 

y 

In  transaction  Tl,  if  the  iinksets  CC  and  PC  axe  multilists,  then  either  the 
(Course_Name=x)  or  (Prof_Name=y)  list  will  be  followed  when  accessing  Course  records.  So 
that  only  those  records  that  satisfy  both  Course_Name  and  Prof-Name  clauses  are  returned, 
it  is  necess2Lrj’’  that  the  C-query  argument  of  the  RETEtEEVE- CHILD  be  nonnull.  If  CC  and  PC 
were  inverted  Usts.  a  niill  C-query  would  suffice. 
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Example  3.  (Processing  queries  using  joins.)  Figure  5.4  shows  a  transaction 
T3  which  prints  the  name  and  phone  number  of  each  instructor  and  the  courses 
that  he  teaches.  Taking  advantage  of  the  fact  that  the  Pname  and  Instructor 
files  have  the  seime  cluster  key,  the  processing  strategy  is  to  relate  records  of 
both  files  by  a  cluster  key  join.  For  each  joined  record  pair,  the  PC  cell  directory 
of  each  Pname  record  is  used  to  access  all  records  of  courses  laughL  by  the 
instructor.  This  processing  strategy  is  a  variation  of  the  "indices  on  the  join 
columns"  method  (see  [BlEs77]). 

Example  4.  (Modifying  multiple  files.)  Figure  5.5  shows  a  transaction  T4(x) 
which  removes  a  student  from  the  university  database  given  his  student 
number.  The  removal  of  a  student  record  triggers  the  deletion  of  its  grade 
records  and  a  removal  of  links  that  are  connected  to  these  records.  The  pro¬ 
cessing  strategy  is  to  retrieve  the  student  record  and  to  access  its  grade 
records  via  linkset  Student_Grades,  while  UNLINKing  them  in  the  process.  Each 
grade  record  is  UNLINKed  from  its  associated  course  record  followed  by  an 
UPDATE  of  the  course  record  and  a  REMOVE  of  the  grade  record.  The  transac¬ 
tion  concludes  with  a  REMOVE  of  the  student  record. 

Example  5.  (Piping.)  Figure  5.6  shows  a  transaction  T5(x)  which  outputs 
records  of  students  who  have  teiken  course  x  in  order  of  ascending  student 
names.  Relying  on  transaction  T2  to  retrieve  student  records,  the  processing 
strategy  is  to  pipe  the  records  of  T2  into  a  SORT  and  to  output  the  records  of  the 
SORT.  Since  T5  does  not  output  records  in  HOLD  mode,  the  call  of  T2  does  not 
request  output  records  of  T2  to  be  in  HOLD  mode. 

The  cost  of  executing  T5  is  the  cost  of  executing  T2  plus  the  cost  of  sorting 
the  records  output  by  T2.  The  number  of  records  output  by  T5  equals  that  of  T2. 
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lists  RETRIEVE  Cname  (list-CC)  WHERE  Courae_Naine=x:  RET(Cname.  Q:iO 
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TRANSACTION  T2(ir.llOT.D_phrn.so)  $(T2(x,I10LD_phrase)) 


u 

§ 


o 

n 


o 

u 

p: 

pj 


u 

u 

I 

to 


c 

u 

t: 

r: 

r.."5 

pi 

II 

M 


tii 


El 


K 


V 

(U 


Pi 

O 

(z< 


o 

p 

a 

V 
XI 
«3 
u 
O 

I 

V 
w 
u 

s 

o 

u 


P’O 


o 

1 

QO 

O 

Ph 


0) 

G*: 

a 

I 

9 

o 


CO 

u 

X) 

(0 

V. 

O 


c 

V 

'C 

p 


o 

t/J 

p 

o 

1 

p 

ce 

t — ( 

a 

u 

1 — 1 
o 

u 

u 

C; 

1 

1. 

1 

rji 

> 

o 

£- 

w 

p 

W 

ri 

§ 

D 

P, 

w 

Pi 


o 

c 

u 

I 

CO 


o 

u 


p 

[:- 

p 

o 


134 


FOR  j_rec  !=  CK_JOIN  Pname  WITH  Instructor  (Phone#)  OVER  JOIN(Pname,  AU,  Instructor,  AU) 
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instructor  and  the  courses  that  he  teaches 
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Figure  5.5  Transaction  T4(x):  Delete  a  student  record  given  its  student  number 
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CHAPTERS.  APPLICATIONS 


In  this  chapter  we  consider  applications  of  the  simple  file,  file  evolution, 
linkset,  and  transaction  models.  We  begin  by  showing  how  these  models  can 
unify  a  number  of  formerly  disparate  works.  Next,  we  study  the  problems  of  file 
design  and  reorganization,  and  propose  a  new  method  for  their  solution.  Finally, 
we  examine  the  significance  of  selecting  storage  structures  and  inverting  attri¬ 
butes  in  file  design  processes. 

6.1  Unification 

In  the  following  pages,  a  number  of  works  will  be  unified  using  the  simple 
file,  file  evolution,  and  linkset  models.  The  methodology  is  to: 

(1)  decompose  the  physical  database  considered  in  the  work  to  identify 
and  describe  the  constituent  simple  files  and  linksets; 

(2)  identify  the  basic  simple  file  and  linkset  operations  that  are  of  con¬ 
cern: 

(3)  compose  cost  expressions  based  on  (1)  and  (2)  in  terms  of  variables  to 
be  optimized. 

The  simple  file  model  will  be  used  in  the  unification  of  all  works.  The  linkset 
model  is  used  when  works  on  index  selection  are  addressed,  and  the  file  evolu¬ 
tion  model  is  used  when  designs  of  hash  based  files  are  considered. 

Batched  searching  of  sequential  and  tree  structured  files  ([ShGo76]).  A 
paper  by  Shneiderman  eind  Goodman  assessed  the  benefits  of  collecting  q\ieries 
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into  a  batch  and  processing  them  simultaneously,  rather  than  processing 
queries  individually.  The  assessment  strategy  was  to  estimate  the  costs  of  pro¬ 
cessing  a  query  individually  and  processing  k  queries  in  a  batch: 


INDIVIDUAL(F)  =  RET(F.  Q(l.  F)) 

BATCH(k.F)  =  RET(F.  Q(A:.  F)) 

where  Q  (fc,  F)  is  given  by: 

query 

descriptor _ f  ef _ kf _ Ss  _ 

Q(fc,  F)  k/N^^  k/N^^^  k/N^^^  cluster  key  search 


The  benefits  accrued  by  batching  over  processing  queries  individually  were 
modeled  by  the  function  %_SAVINGS: 


%_SAVINCS{k,  F) 


k>^INDIVIDUAL{Y)  -  BATCHjk,  F)  v 
k  xrNDlVrDUAL  (F) 


If  F  describes  a  sequential  file,  [ShGo76]  recommend  the  use  of  a  partial 
scan  to  process  queries.  ^  If  F  describes  a  tree  file,  a  cluster  key  search  is 
recommended. 

Listed  below  are  the  values  of  %_SAVINGS  predicted  in  [ShGo76]  and  by  our 
model  for  a  variety  of  batch  sizes.  Descriptors  of  the  sequential  Bind  B-l-  tree 
files  used  in  these  comparisons  are  given  in  Tables  6.3  and  6.4.  As  in  [ShGo76], 
the  topmost  node  of  the  B+  tree  was  assumed  to  be  previously  accessed  and 
main  memory  resident. 


1 


For  sequential  files,  a  partial  scan  is  identical  to  a  cluster  key  search. 
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batch 

size 

k 

Shneiderman 
&:  Goodman 
%SAVINGS 

Simple  File 
Model 
%SAVINGS 

2 

33.3 

33.3 

5 

66.6 

66.7 

10 

81. B 

81.8 

20 

90.5 

90.5 

50 

96.1 

96.1 

100 

98.0 

98.0 

Table  6.1.  Batching  Requests  in  a  Sequential  File 


batch 

size 

k 

Shneiderman 
&  Goodman 
%SAVINGS 

Simple  File 
Model 
%SAVINGS 

10 

1.4 

1.7 

50 

7.0 

7.2 

100 

12.3 

12.4 

150 

16.2 

16.3 

Table  6.2.  Batching  Requests  in  a  B+  Tree  File 


In  both  cases,  there  is  good  agreement.  It  is  worth  noting  that  values  of  %_SAV- 
INGS  for  tree  files  in  [ShGo76]  were  obtained  by  iterating  difference  equations. 
In  contrast,  a  simple  and  direct  calculation  was  used  in  our  model. 

Transposed  file  design  ([Hoff75],  [MaSe77],  [Nia78]).  A  transposed  file  can 
be  envisioned  as  a  column  partitioning  of  a  table  of  records.  Each  peu'tition, 
called  a  subfile,  is  implemented  as  an  unordered  file  with  the  property  that  the 
ith  record  of  the  original  table  can  be  reconstructed  by  adjoining  the  ith  record 
of  each  subfile.  An  objective  of  designing  treinsposed  files  is  to  determine 
attribute-subfile  assignment  pairs  so  that  the  anticipated  file  usage  costs  are 
minimized. 

To  illustrate  how  an  uncomplicated  transposed  file  design  problem  is  formu¬ 
lated,  let  BLOCKSIZE  be  the  size  of  a  block  in  bytes,  LENGTH j  be  the  length  of 
an  attribute  Aj  field  in  bytes,  and  N  be  the  total  number  of  records.  Also,  let  the 
attribute-subfile  assignment  pairs  be  specified  hy  Xj^'. 
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Ck 


Maxk  Rindex  Split  Ascend  A  Ao  S 


:C 


1  logical  valued 

0 

0  0 

1  A 

0  S 

0 

10000 

level  i 

Ph 

Ra 

1^  Mr, 

H,  C, 

Pf-uXU 

PoTii  PmeTi 

X 

1 

10000 

1  1 

0 

10000  0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1  0 

0 

1 

0 

0 

10000 

Table  6.3.  Descriptor  of  a  Sequential  File 

L 

Ck 

Maxk 

Rindex  Split  Ascend  A 

Ao  S 

So 

N 

1  logical  valued 

0 

1  1 

1  A 

0  S 

0 

1000000 

level  i 

Pi 

Roi 

Mr, 

//,  G, 

^i 

PfvilU 

Pov^  PmeTi 

Zi 

2 

100 

1 

1 

100  0 

0 

1 

0 

0 

1 

1 

100 

1 

50 

100  0 

0 

1 

0 

0 

100 

0 

100 

1 

50 

100  0 

0 

1 

0 

0 

10000 

Table  6.4.  Descriptor  of  a 

B+  Tree  File 

L 

Ck 

Maxk  Rindex 

Split 

Ascend 

A  Ao 

s 

So  N 

1  relative  location  0 

1 

0 

0 

A  0 

s 

0  N 

level  i 

Pi 

Roi 

Mr, 

H,  C, 

fit 

Pfulli 

PoUi 

PmeTi  Zi 

1 

1 

0 

/ijt  0 

0 

0 

0 

0 

1 

0 

Tie 

1 

0 

N/ht  0 

0 

1/rt 

0 

0 

hjc 

Table  6.5. 

Descriptor  of 

an  Unordered  File  Implementation  of  Subfile  SF^ 

L  Ck 

Maxk 

Rindex 

Split  Ascend 

A  Ao 

S  So 

N 

1  hash 

IX// 1 

1 

0 

0 

A  Ao 

S  So 

(^/o+Co) 

level  i 

Pi 

Roi 

Mri 

//,•  c, 

Pfulli 

PoZLi 

PmeTi  Zi 

1 

//i 

1 

0 

Hi  0 

0 

0 

0 

0 

1 

0 

Pq 

1 

0 

//o  Go 

Hq 

Pfullo 

PoUq 

0 

Hi 

Table  6.6.  Descriptor  of  a  Hash  Based  File 
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Xfk  “ 


I  If  alfHfrulf  Afia  in  aubfila  SFi^ 

0  bthbrwy$ 

Common  retirloiloni  placed  on  X^t  are  that  there  etiiia  at  lead  one  aitrlbute 
aiilgned  to  a  aubflle  ^  attribute  li  aeslgned  to  precliel/  one 

lubflle  (^Xfn  at).  When  at  moet  two  aubfllea  are  ooneldered,  the  record  ley* 

mantation  problem  is  addressed  (see  [MaSe77]). 

The  descriptor  SF*  of  subfile  SFk  (Le.,  an  unordered  file)  is  given  in  Table 
6.5.  The  blocking  factor  Tt  and  number  of  nodes  in  SFn  are  specified  by: 


r*  =  ^BLOCKSIZE A^Xf„xLENGTM^)  ^ 
ht  =  \N/ru\ 

Record  retrievals  from  a  transposed  file  are  specified  as  dafa  raguesis, 
which  are  (query,  attribute  output  list)  pairs.  The  query  of  a  data  request 
qualifies  records;  the  output  list  specifies  the  data  of  these  records  that  are 
desired.  To  quantify  OiBATA _ REQUEST^,  let: 


NEEDij 


1  U  DATA _ REQUESTi  requires  attribute  A j 

data  for  outpui  or  qualification 
0  otherwise 


ACCESS^ 


1  M  DATA_BEQUESTi  requires  accessing 
subfile  SFt 
0  otherwise 


if  Y.FfEEDij'KXju  >  0 
1  ;■ 

0  otherwise 


Suppose  that  a  subfile  is  scanned  whenever  any  one  of  its  attributes  is  used 

in  prooessing  a  data  request,  and  that  only  data  requests  are  to  be  performed. 
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A  fttHovAl  mi  vtormge  oosi  funsUon  for  a  trantposed  die  ii: 


SAIL^COST  ®  ^{fnti^^CCKSSit^HETiSTk.idX))  ^^SrOR{SFu)  {6.1) 

ik  k 


where:  ^ 

freti  =  frequency  of  DATA^JiEQUESTi 

^-jCALL)  _  scan 

A  transposed  file  design  problem  is  to  minimize  S&K_COST  with  respect  to  Xjk-  " 
By  observation,  S&:R_COST  is  minimized  when  Xjk  =  1  for  all  J=k,  0  otherwise. 
That  is,  each  attribute  is  assigned  to  a  distinct  subfile.  The  resulting  collection 
of  subfiles  is  referred  to  as  awfully  transposed  file. 

In  an  analogous  and  stredghtforward  way,  operations  involving  record 
updates,  insertions,  and  deletions  can  be  included  in  the  file  design  problem. 
Unfortunately,  solutions  to  these  more  general  formulations  are  no  longer 
trivial.  For  a  more  comprehensive  examination  of  this  problem,  see  [Hoff75]  and 
[Nia78]. 

Searching  transposed  files  ([Bat79]),  When  a  data  request  is  processed,  not 
all  subfiles  need  be  scanned.  Since  the  index  positions  (i.e.,  relative  location 
keys)  of  quadified  records  are  always  known,  it  is  necessary  only  to  access  those 
blocks  of  a  subfile  that  contain  records  whose  relative  location  keys  match  those 
of  previously  qualified  records.  In  doing  so.  It  Is  possible  that  a  large  number  of 
unnecessary  block  accesses  can  be  avoided.  ^  The  problem  of  searching  tran¬ 
sposed  files  primarily  concerns  record  qualification  ([Bat79]):  what  sequence 

^  Parameters  f ,  ef,  and  kf  of  ALL  need  not  be  specified  since  they  are  not  used  in  estimat¬ 
ing  S&R_COST. 

^  With  the  restrictions  k=2,  ACCESS^—  1,  and  the  sum  of  the  (possibly  different)  blocksizes 
of  the  two  subfiles  bounded  by  a  oonsrt.euit,  (6.1)  is  identical  in  form  to  the  objective  function 
in  [MaSe77.  p.282]. 

^  Savings  are  noticeable  only  when  a  small  number  of  records  are  qualified  by  a  query.  For 
queries  that  qualify  more  than  a  few  percent  of  a  file’s  records,  the  simpler  search  strategy 
of  subfile  scanning  is  adequate. 
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should  subfiles  be  examined  so  that  the  expected  search  costs  are  minimized? 


To  illustrate  a  simple  formulation  of  a  transposed  file  search  problem,  sup¬ 
pose  queries  of  a  data  request  are  of  the  form: 

\ 

(i4  1  =  I  values  I  j)  AND  ...  AND  {Aq  =  \valuesq\) 


where  the  Aj  are  distinct  attributes  and  \valuesj]  represents  the  set  of  values 
that  can  be  assumed  by  Ay  in  a  requested  record.  Such  queries  can  be  rewritten 
as  Q\AND  Q^AND  •  •  •  AND  Qt.  where  Qi  is  a  conjunction  of  all  clauses  whose 
attributes  belong  to  subfile  SFi. 

Let  SEQ  =  {ii,  ■  ‘  •  ,i  t)  denote  the  sequence  of  subfile  indices  that  specifies 
the  search  order  of  the  t  subfiles.  The  first  subfile  is  scanned  to  identify  those 
records  that  are  subject  to  further  qualification.  The  relative  location  keys  of 
these  qualified  records  are  used  to  access  corresponding  records  in  the  second 
subfile.  Using  the  data  of  the  second  subfile  for  further  qualification,  the  relative 
location  keys  of  records  that  remain  qualified  are  used  to  access  corresponding 
records  in  the  third  subfile,  and  so  on.  Accessing  records  via  relative  location 
keys  Involves  a  cluster  key  search  on  an  unordered  file. 

Ab  an  aid  to  estimating  the  cost  of  processing  the  above  dperatlon,  let  be 
the  selectivity  of  Qi.  is  the  product  of  the  selectivities  of  the  clauses  that 
comprise  Q^.  The  expected  cost  of  processing  a  query  using  subfile  search 
sequence  SEQ  is: 


FCOST(SFQ) 


BFT(SFi^.ALLiJ  + 


t 


J^RET(SFi  qU.SEQ)) 

j-e 


(6.2) 


where  SF,  is  the  descriptor  of  SFi  (see  Table  6.2),  and  ALL^  and  ^(J,SEQ)  are 
the  query  descriptors: 
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query 

descriptor _ f _ ef  kf _ Ss _ 

ALL<  Fi  1  1  scan 

QU.SEQ)  1  cluster  key  search 

6=1  8=1 

The  problem,  therefore,  is  to  determine  a  subfile  search  sequence  SEQ  that 
minimizes  ECOST.  But  for  a  slightly  different  approximation  of  the  ^  function 
(eqn.  (2.2)),  the  ECOST  function  in  [Bat79]  and  (6.2)  are  identical.  A  straightfor¬ 
ward  generalization  of  (6.2)  models  the  cost  of  processing  disjunctive  and 
batched  queries. 

Differential  files  ([SeLo76]).  Record  additions  and  updates  are  normedly 
made  to  the  file  in  which  the  records  reside.  An  alternative  strategy  is  to  direct 
all  new  and  modified  records  into  a  separate  file,  called  a  differential  file,  leav¬ 
ing  the  main  file  unchanged.  Eventually,  the  differential  file  and  main  file  are 
merged  to  form  a  new  main  file. 

[SeLo76]  considered  the  problem  of  locating  records  in  a  differential  and 
main  file  complex.  Bloom  filters  were  proposed  as  a  way  of  indicating  whether  a 
requested  record  could  be  located  in  the  differential  file.  A  Bloom  filter  is  a 
main  memory  bit  vector  B  of  length  M  and  a  collection  of  X  hashing  functions. 
Initially  the  differential  file  is  empty  and  all  bits  of  B  are  set  to  zero.  Whenever  a 
record  is  added  to  a  differential  file,  every  transformation  is  applied  to  the 
record’s  identifier  and  each  of  the  X  bits  are  set  to  1.  When  a  record  is  to  be 
retrieved,  a  search  of  the  differential  file  can  be  eliminated  if  any  one  of  a 
record’s  X  bits  does  not  have  the  value  of  1.  However,  as  the  density  of  Is  grows, 
the  filter’s  discriminating  power  declines,  causing  an  increased  number  of 
unnecessary  searches  of  the  differential  file.  Occasional  main  and  differential 
file  mergings  maximizes  the  effectiveness  of  a  Bloom  filter. 
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Let  M  and  D(i)  be  the  descriptors  of  the  meiin  and  differential  file  at  time  t. 
M  is  not  a  function  of  t  since,  by  definition,  the  main  file  remains  unchanged. 
Assuming  updates  are  independent,  are  uniformily  distributed  over  all  records, 
and  arrive  over  time  at  a  fixed  rate  r,  [SeLo76]  found  the  probability  that  the 
differential  file  is  examined  at  time  t  to  be: 

Pi{t)  =  (  1  -  exp{~TtX/M)  Y, 
the  probability  that  the  main  file  -will  be  accessed  at  time  t: 

Pz{t)  =  \  -  Pi{t)  +  Pi{i)'<exp{-Tt/N^^'^)  , 
and  the  size  of  the  differential  file  at  time  t: 

^(D(0)  =  N^\l  -  expi-rt/N^"^)) 

Because  it  is  possible  for  both  the  medn  and  differential  file  to  be  examined  when 
searching  for  a  record,  P\{t)  +  Pz{t )  ^  1. 

Although  no  specific  implementation  of  a  main  file  and  differential  file  was 
considered  in  [SeLo76],  we  can  still  write  an  equation  for  estimating  the 
retrieval  cost  of  a  single  record  at  time  t: 

R_COST(t)  =  one)  +  ONE*) 

where  ONE  smd  ONE*  are  query  descriptors  compatible  with  D(f)  and  M.  Sup¬ 
plying  values  to  D  (i )  and  M  enables  a  variety  of  implementations  of  a 
differential  file  and  main  file  complex  to  be  eveiluated  in  a  simple  way. 

Index  selection  ([King74],  [Schk75],  [YuWo75],  [AnBe77]).  The  problem  of 
index  selection  is  to  determine  which  attributes  of  a  data  file  to  invert  in  order 
to  minimize  the  anticipated  transaction  processing  costs. 

Consider  the  following  formulation  of  the  index  selection  problem  which 
concerns  the  minimization  of  retrieval  and  storage  costs.  Suppose  queries  are 
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conjunctions  of  clauses.  Let  QUERY^  be  chau*acterized  by; 


EC^ 


1  if  QUERYi  has  an  {Aj  —  value)  clause 
0  otherwise 


Fij  =  selectivity  of  the  Aj  clause  in  QUERYi,  Fij  -  1  if  no  clause  exists 
freti  =  frequency  of  processing  QUERYi 

Let  D  be  the  descriptor  of  the  data  file,  and  be  the  descriptor  of  the 
index  file  for  attribute  Aj.  Typically,  D  describes  an  unordered  file  and  Ij 
describes  a  B+  tree  file.  However,  there  are  variations.  Let  Lj  be  the  descriptor 
of  the  linkset  that  connects  the  Aj  index  file  to  the  data  file.  Clearly,  Lj 
describes  an  inverted  list  or  pointer  array  structure.  It  follows  that  is  the 
number  of  records  in  the  data  file  and  is  the  number  of  distinct  Aj  values 
for  which  individual  index  records  exist.  Since  a  pointer  to  each  data  file  record 
appears  precisely  once  in  each  index  file,  we  have  the  following  relationship:  ® 


(6.3) 

Queries  are  processed  in  the  following  manner.  For  each  (Aj  —  value)  clause 
m  a  query,  the  corresponding  Inverted  list  is  accessed:  provided,  of  course,  that 
an  index  file  for  Aj  exists.  These  lists  are  intersected  and  the  records  identified 
by  the  resultant  list  are  retrieved  by  a  RETRIEVEI_CHILD  operation.  Should  the 
case  arise  that  no  inverted  lists  are  available,  the  query  is  processed  by  scan¬ 
ning  the  data  file. 


The  presence  of  an  index  file  for  attribute  Aj  is  specified  by  the  indicator 
variable  INDEXji 


INDEXj 


1  if  attribute  Aj  is  inverted 
0  otherwise 


®  This  relationship  is  correct  when  every  inverted  list  can  be  stored  in  a  single  record.  When 
inverted  lists  have  many  pointers,  multiple  records  may  be  used  to  contain  the  list,  and  (6.3) 
is  approximate. 
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The  situation  where  no  inverted  lists  are  available  to  process  QUERYi  is  given  by 
NOLJSTi: 


NOLISTi  = 


if  =  0 

1  i 

0  otherwise 


Let  IQ/  be  the  query  descriptor  for  accessing  ein  inverted  list  from  the  Aj 
index  file;  ALL<  be  the  query  descriptor  for  scanning  the  data  file  to  resolve 
QUERYi',  and  CQ<  be  the  child  query  descriptor  for  retrieving  data  file  records 
given  the  intersection  of  aU  relevant  inverted  lists  for  QUERYi'. 


query 

descriptor _ f  _ ef  _ kf  _ Ss _ 

IQ/  1/N^^  1/N^^  1/N  cluster  key  seau-ch 

ALLi  1  1  scan 

i 


Child  query 

descriptor _ f _ ef _ k _ _ Lss 

CQi  ^  ^  linkset  scan 

i  J 


wbcrtt 


Oij 


If  !NDEXj  a  1  and  ACy  a  1 
1  otherwise 


A  storage  and  retrieval  cost  function  for  an  inverted  file  is  therefore: 


S&R_COST  =  STOR  (D )  -f  Yi^NDEXj^STOR  (I/) 

j 


+  E  (1  -  NOLISTi)x 

i 


YlNDEXj^ECijyiRET{lj.  IQ/)  +  RETC{hi,  CQ^) 

3 


+  NOLlSTiXRETCD.  ALLi)  J 
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Note  that  the  linkset  Lj  of  RETC  in  S&R^COST  supplies  only  linkset  design  infor¬ 
mation  to  RETC.  Since  all  iinksets  share  a  common  design  (i.e.,  all  are  inverted 
lists),  any  of  the  could  have  been  used  as  an  argument. 

Minimizing  S&:R_COST  with  respect  to  the  indicator  variables  INDEXj 
identifies  an  optimal  indexing  set.  A  more  general  formulation  of  this  problem  is 
examined  in  Section  6.3. 

Hash  based  file  designs  ([Van73],  [SeDu76]).  Efficient  designs  for  hash 
based  files  have  been  studied  in  two  different  environments:  load  time  eind 
steady  state.  The  load  time  environment  of  [SeDu76]  is  retrieval  only:  there  are 
no  record  updates,  insertions,  or  deletions.  The  steady  state  environment  of 
[Van73]  was  also  retrieval  only,  where  a  large,  but  equal,  number  of  insertions 
and  deletions  had  occurred  prior  to  retrieval. 

A  descriptor  HB  of  a  hash  based  file  considered  in  both  works  is  given  in 
Table  6.8.  is  the  number  of  buckets  in  the  file,  Rq'is  the  record  capacity  of  a 
primary  block.  Retrieving  records  via  their  hash  keys  involves  a  cluster  key 
search  on  a  hash  based  file.  A  storage  and  retrieval  cost  function  for  a  hash 
based  file  is: 

S&R^COST  =  fret  ^RET{WB.  ONE )  +  STOR  (HB )  (6.3) 

where: 

fret  =  frequency  of  retrievals/unit  time 

and 


query 

descriptor  f _ ef _ kf _ Ss _ _ 

ONE  cluster  key  search 

The  problem  addressed  in  both  works  was  to  minimize  S&:R_COST  with  respect 
to  Hi  and  Substituting  values  of  HB  and  ONE  into  (6.3)  yields: 
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S&R_COST 


fTetx(A  +>4ono)  +  H  ix(S  +  SoCq) 


which  is  identical  in  form  to  the  objective  functions  in  [VeinTS,  p2B]  eind  [SeDu76. 
p320]. 

What  distinguishes  these  works  are  the  values  that  are  assigned  to  Hq,  Oq, 
and  rio*  Values  for  these  parameters  can  be  determined  in  the  following  way. 
For  [SeDu76],  the  load  time  node  occupancy  distribution  is  given  by  eqn  (3.2); 
for  [Van73],  the  steady  state  node  occupancy  distribution  is  defined  by  eqn 
(3.17),  where  matrix  eqns  (3.15)  and  (3.16)  correspond  to  eqns  (3.3)  and  (3.4). 
Appl5dng  the  definitions  of  Table  3.1  to  these  distributions  yields  the  values  for 
Hq,  Gq,  and  Go- 

A  more  general  formulation  of  this  problem  is  studied  in  the  following  sec¬ 
tion. 

6.2  Optimal  File  Designs  And  Reorganization  Points 

File  design  is  the  problem  of  selecting  an  implementation  for  a  file.  Its 
solution  is  influenced  by  knowledge  of  the  anticipated  usage  of  the  file.  Due  to 
record  insertions  and  deletions,  the  selected  implementation  is  "optimal"  only 
for  a  brief  time.  In  the  case  of  hash  based  and  indexed  sequential  structures, 
growing  files  have  an  increasing  volume  of  records  accumulating  in  overflow 
areas,  thereby  causing  a  deterioration  in  the  file’s  performance.  The  negative 
impact  of  performance  deterioration  is  reduced  by  occasional  reorganizations. 
File  reorganization  is  the  problem  of  determining  when  a  file  should  be  reorgan¬ 
ized. 

A  literature  survey  reveals  that  the  file  design  and  reorganization  problems 
have  been  addressed  in  isolation.  Consider  the  case  of  hash  based  files. 
Efficient  designs  for  these  files  have  been  studied  in  two  different  environments: 
load  time  [SeDu76]  and  steady  state  [Van73].  In  both  cases,  the  files  that  were 
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considered  had  a  zero  net  growth.  Performance  deterioration  was  shown  to  be 
minimal  in  [Van73],  while  deterioration  did  not  occur  in  [SeDu76].  As  a  conse¬ 
quence.  it  was  unnecessary  to  consider  file  reorganizations. 

Previous  work  concerning  file  reorganization  assumed  an  efficient  file 
design  had  already  been  selected.  Given  that  a  file’s  performance  deteriorates 
linearly  with  time,  [Shn73]  determined  optimal  reorganization  points  when  the 
intervals  between  reorganizations  were  of  equal  length.  [Tuel78]  generalized  the 
solution  to  include  reorganization  intervals  of  nonequal  length.  [LoMu77]  showed 
how  Shneiderman's  result  was  related  to  optimal  policies  for  file  backup,  check¬ 
pointing,  and  batch  updating.  In  a  formulation  with  less  restrictive  assumptions, 
[YDT76]  proposed  a  heuristic  for  determining  reorganization  points. 

In  the  following  sections,  a  method  is  proposed  for  solving  the  problems  of 
file  design  and  reorganization.  ®  Applications  of  the  method  to  hash  based  eind 
indexed  sequential  files  are  considered. 

6.B.1  Problem  Formulation 

Common  operations  on  a  single  simple  file  include  retrieving,  inserting,  and 
deleting  individual  records,  updating  previously  accessed  records,  and  retriev¬ 
ing  all  records  of  a  file.  Functions  which  estimate  the  cost  of  performing  these 
operations  on  a  file  with  descriptor  F  are: 


Hie  proposed  solution  for  file  reorganization  presented  here  is  similar  to  that  in  a  recent¬ 
ly  circulated  work  by  Ramirez  ([Ram80]).  The  approach  in  that  work  is  based  on  data  struc¬ 
tures.  quite  different  from  our  approach,  but  the  method  for  determining  reorganization 
points  is  simileLT. 
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function 


cost  of: 


RET{¥,  ONE) 

RET{Y,klA.) 

INS{¥) 

DEL{¥,Om) 

UPD{¥) 


retrieving  a  record  given  its  cluster  key  identifier 
retrieving  all  records 
inserting  a  record 

deleting  a  record  given  its  cluster  key  identifier 
updating  a  previously  retrieved  record 


where  ONE  and  ALI^  are  the  query  descriptors: 


query 

descriptor 

f 

ef 

kf 

Ss 

ONE 

1/Nk^^ 

cluster  key  search 

ALL 

1 

1 

1 

scan 

Statistics  that  characterize  a  file’s  usage  in  terms  of  these  operations  are: 

fret  =  number  of  times  each  record  is  retrieved  via  its  identifier  per 
week 

fsc  =  number  of  file  scans  per  week 
fins  =  number  of  insertions  per  week 
fdel  =  number  of  deletions  per  week 

fupd  =  number  of  times  each  record  is  retrieved  and  updated  per  week 

Note  that  the  time  interval  of  a  week  was  chosen  only  for  expository  reasons. 
Any  time  interved  could  be  used.  Also  for  similar  reasons,  usage  statistics  were 
assumed  constant  over  all  weeks.  Modeling  variations  in  usage  is  accomplished 
easily  by  indexing  statistics  by  time  (e.g.,  fsc^  -  number  of  scans  in  week  t). 

Performance  deterioration  occurs  when  operations  become  more  expensive 
to  execute.  This  comes  as  a  result  of  record  insertions  and  deletions,  which  in 
the  case  of  hash  based  and  indexed  sequential  structures  appears  as  changes  in 
the  values  of  Hq,  Pfull^,  and  Pouq.  Because  a  file  descriptor  changes 

with  time,  let  denote  the  descriptor  of  a  file  at  (the  end  of)  week  t.  Fq 
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describes  the  file  at  creation  time  (ie.,  before  insertions  eind  deletions  occur).  If 
the  values  of  a  descriptor  remain  constant  over  a  week,  the  cost  of  using  the  file 
during  week  t  is: 

Static{F,t)  =  {frgt+fupd)xN^*^xRET(Ft,  ONE)  +  fscxRET{Fi,  SCAN) 

+  fir.sxfNS{¥t)  +  fdelxDEL{Ft)  +  fupdxN'^^^xUPD{Fi)  +  STOR{Fi) 

However,  since  file  descriptors  do  evolve,  the  usage  cost  of  a  file  during  week  t  is 
more  closely  approximated  by: 

Usage_Cost(F.t)  = 

Figures  6.1  and  6.2  illustrate  performance  deterioration  curves  predicted  for 
hash  based  and  indexed  sequential  files  in  a  variety  of  environments.  File 
descriptors  for  weeks  0,  4,  8,  and  12  were  known  for  each  experiment.  Descrip¬ 
tors  for  intervening  weeks  were  determined  by  linear  interpolation: 

F,  =  F<  +  (^)(F(.4- F,) 

4 

■\ 

vdrere  and  i=0,  4,  8,  or  IS. 

6.2.2  A  Solution 

A  method  for  determining  optimal  reorganization  points  and  file  designs  for 
files  with  fixed  lifetimes  will  be  developed.  We  begin  by  presenting  a  solution  to 
the  file  reorganization  problem. 

Let  the  lifetime  of  a  file  be  T  weeks.  At  the  end  of  each  week,  a  decision  is 

made  to  reorganize  the  file  or  not.  The  average  number  of  records  per  node,  or 

\ 

loading  factor,  of  a  reorganized  file  is  the  constant  s. 

■y 

'  Note:  the  argument  F  of  Static  represents  an  array  of  descriptors  indexed  by  t. 
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Usage_Cost 


Hash  "Raffed  File: 


Descriptor  Values:  /?o=-^oo=5,  A'^^°^=20000,  S=So=. 05683,  A=Ao=. 00131 

^0=4000 

Usage  SLatisUcs:  fret=fupd=5,  fsc=0,  fins=1000 


Figure  6.1.  Performance  Deterioration  of  a  Hash  Based  File 
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Usage_Cost 


Indexed  Sequential  File: 


Descriptor  Values:  RQ=RoQ=b,  N^^^^=20000,  S=So— .056B3,  A— Ao-. 00131 

R  7,-.innn 

^cluster_index — 

Usage  Statistics:  fret=fupd=5,  fsc=7,  fins=1000 


Figure  6.2.  Performance  Deterioration  of  an  Indexed  Sequential  File 
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Let  $(Li)  be  the  sum  of  the  costs  of:  1)  constructing  the  file  at  the  end  of 
week  i,  Z)  using  the  file  during  weeks  i+1  through  j,  and  3)  dumping  the  file  at 
the  end  of  week  j.  Let  be  the  descriptor  of  a  file  at  week  t,  where  the  file  was 
constructed  at  week  i,  and  let  DUMP(F)  be  the  cost  of  dumping  a  file  with 
descriptor  F.  ®  $(ifj)  is  given  by: 

=  CON{Fi^i)+  i  Usage_Cost(Fi,t)  +  DUMPiFi^j) 

i=i+l 

Let  Cost(t)  be  the  minimal  usage  Emd  reorganization  cost  for  a  file  with  a  lifetime 
of  t  weeks.  Clearly, 


Cost(O)  =  0 
Cost(l)  =  $(0.1) 

To  estimate  Cost(2),  note  that  the  last  reorganization  was  either  at  week  0  or 
week  1.  Choosing  the  situation  with  minimal  cost  yields: 


Cost  {2)  =  m-in[  Cosf  (O)  +  ^(0,  2),  Cosf  (l)  +  ^^(1,  2)  ] 
and  in  general. 


Cost(i)  =  minf  Cost  (i)  +  $(i,t)  ]  (6.4) 


By  incrementing  the  index  t,  emd  progressively  building  upon  previous 
results,  Cost(T)  is  obtained.  Recording  the  value  of  i  used  in  each  Cost(t)  calcu¬ 
lation,  for  which  Cost(i)  +  $(i,t)  is  minimal,  determines  the  weeks  at  which  the 

■g - 

During  reorganization,  the  dunaping  of  one  file  is  coincident  with  the  creation  of  another. 

File  dumping  involves  retrieving  records  of  a  file  in  the  order  that  they  are  loaded  into  the 
new  structure.  This  poses  no  problem  for  indexed  sequential  structures  since  a  scan  re¬ 
trieves  records  in  the  desired  key  order.  Ihe  situation  is  different  for  hash  bzised  files. 

Since  a  scan  of  one  hash  based  file  need  not  retrieve  records  in  the  hash  key  order  of  anoth¬ 
er  file,  a  file  sort  may  be  required.  Setting  the  number  of  intem2d  buffers  allocated  to  a 
SORT  to  be  4,  DUMP  can  be  approximated  by: 

PPT  (F,  ALL )  for  indexed  sequential  files  and 

steady  state  hash  based  files 
^^^^(F,  ALL,  otherwise 


DUMP{F)  = 
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file  should  be  reorganized.  These  are  the  optimal  reorganization  points  for  the 
file. 

The  file  design  problem  is  integrated  into  this  framework  by  allowing  s,  the 
loading  factor  of  a  reorganized  file,  to  become  an  optimization  variable.  $(i,j) 
becomes  $(s,i,j),  and  (6.4)  becomes: 

Cost{t)  =  min[  Cost{i)  +  mini  ${s,i,j)  ^  ]  (6.5) 

G<i<t  s 

Recording  the  values  of  i  and  s  used  in  each  Cost(t)  calculation,  for  which 
Cost(i)  +  $(s,i,t)  is  minimal,  determines  the  optimal  file  designs  and  reorganiza¬ 
tion  points  for  the  file.  It  is  worth  noting  that  other  parameters  of  a  file  descrip¬ 
tor  (eg.,  block  capacities  Rq  and  Roq,  and  file  design  parameters  Ck,  Ascend, 
etc.)  could  also  be  optimization  variables  in  file  design  processes. 

Equation  (6.5)  can  be  evaluated  efficiently  using  dynamic  programming 
techniques  (see  [RamBO],  [AHU74]). 

6.2.3.  Experimental  Results 

Figure  6.3  shows  results  of  a  computation  experiment  concerning  a  hash 
based  file  with  an  initial  size  of  20000  records  and  a  growth  rate  of  400  records 
per  week.  For  a  lifetime  of  25  weeks,  the  optimal  strategy  is  to  construct  the 
file  at  week  0  with  a  loading  factor  of  16.7  records/node,  and  to  reorganize  the 
file  at  the  end  of  week  12  with  a  loading  factor  of  17.0. 

The  dashed  line  of  Figure  6.3  indicates  the  theoretically  minimEd  usage  cost; 
ie.,  the  cost  that  would  be  attained  if  the  file  were  reorganized  each  week. 
(Usage  costs  do  not  include  file  construction  and  dumping  costs.)  The  solid  line 
indicates  the  usage  cost  which  corresponds  to  the  optimal  reorganization  stra¬ 
tegy.  Observe  that  when  the  file  is  reorganized,  its  performance  is  not  optimal. 
In  time,  however,  the  file  grows  into  (and  out  of)  an  optim.al  state.  File  designs 
that  accommodate  file  growth  in  this  manner  are  robust. 
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Usage  Cost 


Hash  Cased  File; 

Descriptor  Values:  i?o=/»Oo=20,  20000,  S=So=. 05683,  A=Ao=.00131 

Usage  Statistics:  fret=fupd=.5.  fsc=0.  fins^400,  fdel=0 
File  Lifetime:  T3=25 


Figure  6.3.  'ft^eekly  Usage  Costs  for  a  Hash  Based  File 

With  Reorganizations 
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Usage_Cost 


Indexed  Sequential  File: 

Descriptor  Values:  i?c=-^Oo=20,  //^^°^=20000,  S=So=. 05683,  A=Ao=. 00131 
^cluster^index  ”^55 

Usage  Statistics:  fret=fupd=.5,  fsc=?,  rins=400,  fdel=0 
File  Lifetime:  T=25 

Figure  6.4.  Weekly  Usage  Costs  for  an  Indexed  Sequential  File 

With  ReorganizaLioiis 
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Figure  6.4  shows  results  of  an  experiment  concerning  an  indexed  sequential 
file  of  a  size  and  usage  comparable  to  the  hash  based  file.  For  a  lifetime  of  25 
weeks,  the  optimal  strategy  is  to  construct  and  reorganize  the  file  using  the 
loading  factor  of  19  records/node.  Reorganizations  occur  at  the  end  of  weeks  2, 
4,  6,  8.  10,  13,  16,  19,  and  22. 

Unlike  the  hash  based  designs,  the  recommended  indexed  sequential 
designs  do  not  attain  a  theoretically  minimal  usage  cost.  This  is  due  primarily 
to  the  inverse  relationship  between  usage  costs  and  reorganization  frequency: 
the  suggested  design  reflects  a  balancing  of  these  tradeoffs.  Note  that  in  order 
to  accommodate  file  growth  and  less  frequent  reorganizations,  a  record  slot  is 
left  vacant  in  every  node  after  each  reorganization. 

In  additional  experiments,  reorganization  points  were  determined  for  file 
designs  with  fixed  initial  loading  factors.  Some  results  of  these  experiments  are 
listed  in  Tables  6.7  and  6.8.  Note  that  for  those  situations  where  the  initial  load¬ 
ing  factor  of  four  yielded  the  minimal  Cost,  Costs  for  the  loading  factors  of  three 
0Uid  five  were  at  most  5%  greater.  This  suggests  that  a  file’s  performance  is  not 
sensitive  to  initial  loading  factors.  Additional  evidence  supporting  this  claim  is 
given  in  Figure  6.5.  For  the  hash  based  file  (Fig.  6.5a),  the  optimal  loading  factor 
is  approximately  17.  However,  using  a  loading  factor  as  low  as  12  or  as  high  as 
21  yields  a  Cost  that  is  within  6%  of  the  minimum.  ®  Therefore,  choosing  an  initial 
loading  factor  within  20%  of  the  optimal  value  results  in  an  acceptable  hash 
based  file  design. 

For  the  indexed  sequential  file  (Fig.  6.5b),  the  optimal  loading  factor  is  19. 
Selecting  a  loading  factor  in  the  range  of  15  to  21  yields  a  Cost  within  6%  of  the 
minimum.  Therefore,  allowing  up  to  20%  free  space  in  a  primary  block  (ie.,  an 

^  Designs  whose  costs  are  within  E  of  the  minimal  cost  are  considered  acceptable  or  good 
designs.  E  was  chosen  to  be  6%  in  our  studies. 
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ftiel/fins  Tniiial  T.oading  Reorganization  Points  Minimum 

Ratio_ Factor (Week  Numbers) Cost(T) Cost 


0 

5 

3,7,11,15 

23724 

4- 

5,12^ 

22B62  * 

3 

3 

23048 

.5 

5 

5,10,15 

19839 

4- 

6,13 

1,9241  * 

3 

- 

19461 

1 

5 

every  2  vfeeks 

15496 

4 

every  2  weeks 

15278 

n 

4,0,12,10 

-1  m 

i 

Hash  Based  File: 

Descriptor  Values:  R^- 

:/?Oo«=5.  ^V'^^^riEOOOO,  S=So 

=.05633,  A=Ao=. 00131 

Usage 

Statistics;  fret= 

:fupd=5,  fsc=0,  flns=1000 

File  Lifetime;  T-20 

Table  6.7.  Reorganization  Points  for  Hash  Eased  Files 

fdel/fins 

Initial  Loading 

Reorganization  Points 

Minimum 

Ratio 

Factor 

(Y^'eek  Nunibcrs) 

Cost(T)  Cost 

0 

5 

every  week 

36201 

4 

5,12 

37547  ♦ 

3 

7 

39261 

.5 

5 

every  week 

30289  • 

4 

2,5,8,11,14.17 

31573 

3 

8 

33498 

1 

5 

every  week 

2438 1 

4 

every  2  weeks 

25478 

n 

u 

A  n  A  r-i  'i  ry 

io 

27571 

Indexed  Sequential  File: 

Descriptor  Values:  Rq=Roq=^,  N^^^^=2000Q,  S=So=:. 05683,  A=Ao=. 00131 

clu5ter__index 

Usage  Statistics:  frct=fupd=5,  fsc=7,  fins=1000 
File  Lifetime:  T=20 
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Figure  6.5.  Loading  Factor  Sensitivity  Curves  for  Files  that  are  Reorganized 


initial  loading  factor  of  .8^0  to  ^o)  results  in  an  acceptable  indexed  sequential 
design. 

Robust  files  are  characterized  by  a  small  initial  loading  factor.  Usually,  file 
reorganizations  become  less  frequent  as  initial  loading  factors  decrease.  Robust 
designs  are  therefore  appropriate  for  fixed  lifetime  files  where  no  reorganiza¬ 
tions  occur.  Figure  6.6  shows  load  factor  sensitivity  curves  for  two  files  that  are 
not  reorganized  during  their  25  week  lifetime.  For  these  examples,  choosing  a 
loading  factor  within  15%  (20%)  of  the  optimal  value  results  in  an  acceptable 
indexed  sequential  (hash  based)  design. 

In  a  complementary  series  of  experiments,  optimal  loading  factors  were 
determined  for  files  that  were  reorganized  at  constant  interveds.  For  the  hash 
based  file  of  Figure  6.7a,  acceptable  reorganization  intervals  can  have  lengths  of 
3  weeks  or  greater.  In  contrast,  for  the  indexed  sequential  file  of  Figure  6.7b, 
acceptable  interval  lengths  are  less  than  20  weeks. 

Note  that  the  same  hash  based  file  was  considered  in  the  experiments  of 
Figures  6.3  and  6.7a.  In  Figure  6.3,  the  theoretically  minimal  usage  cost  was 
achieved  by  reorganizing  the  file  each  week.  From  Figure  6.7a,  we  find  that  a 
weekly  reorganization  strategy  is  16%  more  costly  than  the  optimal  reorganiza¬ 
tion  strategy. 

The  same  indexed  sequential  file  was  considered  in  Figures  6.4  and  6.7b. 
From  Figure  6.7b,  we  find  that  there  is  a  negligible  difference  in  costs  between  a 
weekly  reorganization  strategy  and  the  optimal  strategy. 

Owing  to  the  relative  insensitmties  of  file  performance  to  reorganization 
interval  length,  determining  efficient  file  designs  are  more  important  than 
determining  optimal  reorganization  points.  Since  optimal  loading  factors  need 
not  be  calculated  to  a  high  accuracy,  it  appears  that  solutions  to  the  file  design 
and  reorganization  problem  for  hash  based  and  indexed  sequential  files  ceui  be 
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Figure  6.6.  Loading  Factor  Sensitivity  Curves  for  Files  that  are  not  Reorganized 
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obtained  easily. 


6.2.4.  Files  with  Unknown  Lifetimes 

Most  files  encountered  in  practice  have  indefinite  lifetimes.  So  far,  we  have 
only  considered  files  with  known  lifetimes.  To  broaden  the  scope  of  our  work, 
recall  that  to  determine  optimal  file  designs  and  reorganization  points  requires 
knowledge  about  the  anticipated  useige  of  a  file.  Such  knowledge  is  obtained  by 
forecasting  future  usage  trends  from  observations  on  current  trends.  Inherent 
to  forecasting  is  that  short  term  predictions  are  more  reliable  than  long  term 
predictions.  Consequently,  statistics  that  are  used  in  performance  calculations 
apply  only  to  a  finite  future  period.  If  we  identify  this  period  with  the  lifetime  of 
the  file,  we  can  determine  optimal  file  designs  and  reorganization  points.  Once 
the  period  has  expired,  new  information  about  the  file's  usage  will  have  been 
gathered,  thereby  extending  the  life  of  the  file  for  ein  additional  finite  time.  In 
this  way,  file  designs  and  reorganization  points  for  a  file  with  an  unknown  life¬ 
time  can  be  determined. 

"Global"  solutions  to  the  file  design  and  reorganization  problem  for  files 
» 

with  unknown  lifetimes  Eire  chimerical.  The  best  that  we  may  hope  to  attain  are 
series  of  optimal  solutions  over  finite  and  consecutive  periods  of  time. 

6.2.5  Remarks 

A  method  for  solving  the  problems  of  file  design  and  file  reorganization  was 
presented.  Applications  of  the  method  to  hash  based  and  indexed  sequential 
files  revealed  that  it  is  unnecessary  to  determine  optimal  loadi^  factors  and 
reorganization  points  to  a  high  accuracy.  Although  these  results  are  prelim¬ 
inary,  they  suggest  that  practicable  solutions  to  file  design  and  reorganization 
problems  may  be  easy  to  obtain. 
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6.3  Structure  Selectiou  and  Index  Selection 


Two  fundamental  problems  that  arise  in  database  design  and  file  design  are 
structure  selection  and  index  selection.  Structure  selection  is  the  problem  of 
choosing  a  storage  structure  for  a  collection  of  records.  Choices  include  unor¬ 
dered,  hash  based,  and  indexed  sequential  structures  {[C0DA78]).  An  appropri¬ 
ate  storage  structure  often  enhances  the  performance  of  a  file  but  is  alone 
insufficient  to  assure  good  performance.  Aids  in  the  form  of  index  files  may  also 
be  needed.  Determining  which  attributes  to  invert  is  the  problem  of  index  selec¬ 
tion. 

A  literature  survey  reveals  that  structure  and  index  selection  have  been 
addressed  jointly,  but  only  to  a  limited  extent.  A  subproblem  of  structure  selec¬ 
tion  is  cluster  key  selection,  ie.,  determining  a  key  on  which  a  file  is  to  be 
clustered  Papers  concerning  structure  selection  have  concentrated  pri¬ 
marily  on  cluster  key  selection.  Cluster  keys  formed  by  a  concatenation  of 
several  attributes  were  examined  in  designs  of  multiple  attribute  trees 
([KSY78])  and  sequential  files  ([Jak80]).  Index  selection  was  considered  with 
cluster  key  selection  in  designs  of  multikey  hash  based  files  ([RoLo74])  and  mul¬ 
tikey  indexed  aggregate  files  ([LiYa78]).  Other  papers  on  index  selection  con¬ 
cerned  unordered  files  ([King74],  [Schk75],  [YuWo75],  [AnBe77]),  thereby  avoid¬ 
ing  cluster  key  selection. 

Unfortunately,  the  simplest  form  of  structure  selection  has  yet  to  be  stu¬ 
died.  namely,  choosing  among  unordered,  hash  based,  and  indexed  sequential 
structures  with  single  attribute  cluster  keys.  In  the  following  sections,  this 
problem  is  studied  jointly  with  index  selection.  Results  of  the  study  concern  new 
designs  of  inverted  files. 

A  structure  clusters  records  on  a  key  if  records  with  similar-valued  keys  are  stored  near 
each  other.  Simple  files,  for  example,  cluster  records  on  cluster  keys. 
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6.3.1  Structure  of  Inverted  Files 

An  inverted  file  consists  of  a  data  file  and  zero  or  more  index  files.  Each 
index  file  is  connected  to  the  data  file  by  precisely  one  linkset.  In  our  study, 
unordered,  hash  based,  or  indexed  sequentiad  structures  implement  the  data 
file.  B+  trees  implement  the  index  files.  For  the  case  of  hash  based  aind  unor¬ 
dered  data  files,  iinksets  are  inverted  lists  or  pointer  arrays.  For  indexed 
sequential  data  files,  Iinksets  are  cellular  serial  structures  where  cells  are 
identified  with  single  nodes. 

All  attributes  -  except  for  the  attribute  of  the  data  file’s  cluster  key  -  are 
eligible  for  inversion.  Restrictions  regarding  the  selection  of  cluster  keys  are 
the  following.  For  hash  based  files,  cluster  keys  are  derived  from  identifiers. 
For  unordered  files,  there  are  no  restrictions  since  relative  location  keys  are  not 
based  on  attributes.  For  indexed  sequential  files,  any  attribute  may  serve  as  the 
cluster  key.  However,  special  arrangements  concerning  record  insertions  must 
be  observed  if  a  nonidentifier  (i.e.,  a  key  which  is  possessed  by  many  records)  is 
used. 

Suppose  a  record  is  to  be  inserted  into  one  of  a  number  of  nodes,  where  all 
such  nodes  are  populated  by  records  possessing  the  same  nonidentifier.  Intui¬ 
tively,  an  optimal  record  placement  strategy  would  populate  these  nodes  as 
evenly  as  possible;  deviations  from  a  uniform  loading  negatively  impact  perfor¬ 
mance.  One  way  to  achieve  a  (reasonably)  imiform  loading  is  to  select  randomly 
a  node  in  which  to  insert  the  record.  This  is  equivalent  to  clustering  the  file  on 
an  eurtificial  identifier  that  is  formed  by  the  concatenation  of  the  nonidentifier 

and  a  fixed  length  random  value;  it  is  this  reindom  value  that  selects  the  node  in 

^  Actually,  there  Is  no  a  priori  reason  why  the  attribute  of  the  cluster  key  cannot  be  in¬ 
verted.  We  chose  not  to  invert  it  for  reasons  of  modeling  accuracy.  In  the  linkset  model, 
child  records  of  linkset  occvirrences  are  assumed  to  be  distributed  randomly  over  a  file. 

However,  when  the  link  key  of  a  linkset  is  the  cluster  key  of  the  child  file,  child  records  of  a 
linkset  occurrence  arc  clustered  within  a  few  nodes.  Consequently,  the  linkset  model 
significemtly  overestimates  the  costs  of  accessing  child  records. 
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which  the  record  is  to  be  stored.  Figure  6.8  illustrates  an  indexed  sequential  file 
which  uses  this  method  to  cluster  employee  records  on  STATUS  values. 

6.3.2  Operations  on  Inverted  Files 

Operations  on  inverted  files  may  be  expressed  as  actions  on  the  set  of  data 
file  records.  This  enables  transactions  to  be  developed  without  knowledge  of  the 
inverted  file’s  implementation.  Furthermore,  such  transactions  will  remain 
functional  even  though  changes  to  the  implementation  may  occur.  Common 
operations  include  record  retrieval,  insertion,  deletion,  and  modification.  Tran¬ 
sactions  which  realize  these  operations  are: 

GET(Query,  Sortkey,  HOLD__phrase)  -  retrieve  aU  records  satisfying  Query. 
Optionally,  records  can  be  returned  in  either  ascending  Sortkey  order 
or  in  HOLD  mode. 

ADD(rec)  -  insert  rec  into  the  data  file. 

SUB(rec)  -  remove  rec  from  the  data  file,  rec  is  required  to  be  in  HOLD 
mode. 

CHANGE(Attribut8,  New__value,  rec)  -  set  New_value  to  be  the  value  of  Attri¬ 
bute  in  rec,  rec  is  required  to  be  in  HOLD  mode. 

A  GET  is  processed  by  using  available  indices;  ADD,  SUB,  and  CHANGE  impli¬ 
citly  involve  index  file  update  and  maintenance.  Based  on  simple  file,  linkset, 
and  auxiliary  operations,  models  of  the  above  transactions  are  presented  in 
Appendix  IV. 

The  following  transactions  illustrate  uses  of  these  operations.  Transaction  I 
prints  all  records  satisfying  query  Q  in  order  of  ascending  attribute  A  values: 
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Figure  6.8.  Clustering  on  Artificial  Identifiers 


TRANSACTION  I 


[  FOR  X  :=  GET(QA)  DO 
\  print  x;  j; 

i 

Transaction  II  sets  the  value  of  attribute  A  to  v  in  all  records  that  satisfy  query 


TRANSACTION  D 

I  FOR  X  :=  GET(Q\.HOLD)  DO 
\  CHANGE(A.v.x);  ]; 

] 

Transaction  III  deletes  those  records  satisfying  Q": 

TRANSACTION  III 

\  FOR  X  :=  GET<Q’'..HOLD)  DO 
i  SUB(x);  i; 

i 

6.3.3  Objective  Function 

The  configuration  of  an  inverted  file  is  the  storage  structure  and  cluster 
key  of  its  data  file  and  the  list  of  attributes  that  are  inverted.  Such  information 
is  specified  by  values  assigned  to  the  parameters  of  Table  6.9.  This  collection  of 
vedues  is  called  the  inverted  file’s  configuration  descriptor. 
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d  implementation  of  the  data  file:  indexed  sequential,  hash  based, 

or  unordered 

c  index  of  the  attribute  on  which  the  data  file  is  clustered;  0  if 

the  data  file  is  unordered 

JndeXj  indicator  variable  that  specifies  whether  attribute  Aj  is 

inverted  (=1)  or  not  (=0).  Indexc-0  since y4c,  the  cluster 
key  of  the  data  file,  is  not  inverted 

Table  6.9.  Parameters  of  a  Configuration  Descriptor 

The  structures  of  an  inverted  file  are  defined  by  the  descriptors  of  its  data 
file,  index  files,  and  linksets.  These  descriptors  ceui  be  derived  from  a 
configuration  descriptor  and  from  the  values  that  are  assigned  to  the  data 
parameters  of  Table  6.10.  Appendices  III  and  IV  outline  the  reduction  pro¬ 
cedures. 

Assume  that  all  queries  are  conjunctions  of  clauses.  Let  GET^  denote  the 
GET  operation  involving-  Query i,  and  let  C/fANGEj  denote  a  CHANGE  operation 
involving  attribute  Ay.  Statistics  that  characterize  the  arguments  of  CETi  are 
listed  In  Table  6.10  under  retrieval  parameters.  Statistics  that  characterize  the 
execution  frequency  of  operations  CHANGE j,  ADD,  and  SUB  are  listed  in 

Table  6.10  under  usage  parameters. 

Suppose  an  inverted  file  has  as  its  configuration  descriptor.  Let 
STORAGE  (if)  be  its  storeige  cost,  and  ${GETi){‘d),  $  {CHANGE $  {AJ)D){‘6), 
eind  $  {SUB){'6)  be  the  cost  of  performing  a  GET^,  CHANGE ADD,  and  SUB. 
The  sum  of  the  storage  costs  and  operation  costs  (weighted  by  their  execution 
frequencies)  yields  the  USAGE_COST  of  the  inverted  file: 

USAGE^COSTi-O)  =  STORAGE {^)  +  j]  ngeti>i${GETi){'d)  +  Tiaddy^$  {ADD  ){•&) 

i=l 

Ip 

Expressions  for  these  fxinctions  are  given  in  Appendix  IV. 
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data  pararaeters 


A 

block  access  cost 

S 

block  storage  cost  per  month 

Nrec 

number  of  records  in  file 

k 

number  of  invertable  attributes 

Vi 

number  of  distinct  values  assigned  to  attribute  Aj 

blocklen 

length  of  block  in  b3rte3 

ptrlen 

length  of  pointer  in  bytes 

rndlen 

length  of  random  vedue  in  bytes 

maxreclen  maximum  length  of  an  index  record  in  bytes 


lengthj 

length  (in  bytes)  of  field  for  attribute  Ay 

datalen 

combined  length  (in  bytes)  of  all  fields  not  corresponding  to 
invertable  attributes 

q 

retrieval  parameters 

number  of  queries 

f'ij 

selectivity  of  the  Ay  clause  in  QUERYi,  Fij  =  l  if  no  clause  exists 

EC,j 

indicator  variable  that  designates  if  QUERYi  has  a 
(Ay=value)  clause  (=1)  or  not  (=0) 

sorted^ 

index  of  attribute  on  which  output  records  are  to  be  sorted  for 
QUERYi,  0  if  output  is  unsorted 

usage  parameters 

nadd  number  of  ADD  operations  per  month 

nsub  number  of  SUB  operations  per  month 

nchgj  number  of  CHANGE^  operations  per  month 

ngeti  number  of  CETi  operations  per  month 

Table  6.10.  Parameters  of  a  Structure  and  Index  Selection  Problem 
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(6.6) 


k 

+  nchgjy^^ {CHANGE j)('0)  +  7isuby:$  {SUB  ){-6) 

j=i 

Minimizing  USAGEI_COST  with  respect  to  if  yields  an  efficient  inverted  file  design 
and  a  solution  to  a  structure  selection  and  index  selection  problem.  Because 
the  index  selection  problem  is  known  to  be  NP-hard  ([Bat78]),  it  is  unlikely  that 
there  exist  eilgorithms  to  perform  this  minimization  efficiently. 

6.3.4  £:]q>erimental  Results 

Suppose  a  file  contains  20000  records  and  has  three  attributes  that  are  eli¬ 
gible  for  inversion:  eui  identifier  Ai  and  two  nonidentifiers  Az  and  A3.  Further, 
suppose  a  GET  operation  is  one  of  the  following;  GETi  -  retrieving  individual 
records  satisfying  (Aj  =  Ui),  GETz  -  retrieving  records  satisfying  (A2  =  dz)  sorted 
onAi,  or  GETz  -  retrieving  records  satisfying  (A3  =  as).  Statistics  describing 
these  and  other  characteristics  of  the  file  au-e  given  in  Table  6.11. 

The  number  of  records  returned  by  a  GETi  is  approximated  by  Nrec/Vi.  In 
order  to  cover  an  interesting  spectrum  of  situations,  values  for  Vz  and  Vz  (the 
number  of  distinct  values  that  are  assigned  to  As  and  A  3)  are  taken  from  the  set 
^5,  25,  125,  625,  3125,  15825j.  At  one  extreme  {Vi  =  5),  nonidentifier  queries 
qualify  approximately  20000/5=4000  records;  at  the  other  {Vi  =  15625), 
20000/15625=1.3  records  are  qualified. 

Inverted  file  configurations  that  minimize  USAGE_COST  are  displayed  in 
Table  6. 12  along  with  the  minimal  USAGE_COST  In  edl  cases,  the  recom¬ 
mended  storage  structures  are  indexed  sequential.  Tables  6.13-17  show  how 
other  inverted  file  configurations  compare  to  those  in  Table  6.12.  The  file 
designs  that  correspond  to  blank  entries  in  these  tables  have  USAGE_C0STs  that 

There  are  0{k  2*)  distinct  configurations  of  201  Inverted  file  with  k  invertable  attributes. 

Solutions  were  obtained  by  evaluating  every  configuration  (24  In  aU). 
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A 

.0013 

blocklen 

4096 

nadd 

300 

S 

.2773 

ptrlen 

4 

nsub 

300 

Nrec 

20000 

rndlen 

4 

nget 

[  9000,  500,  500  ] 

k 

3 

maxreclen 

508 

nchg^- 

0.  ZOO,  ZOO 

q 

3 

datalen 

75 

V 

[20000.  Vg.  Pg] 

length 

[  10.  7,  8 

(20000)“* 

1 

1 

1 

0 

0 

1 

1 

EC^  = 

0 

1 

0 

1 

1 

0 

0 

1 

sortedi, 


Table  6.11.  Input  Values  to  a  Structure  and  Index  Selection  Problem 


min  cost, 

V 

s 

structure 

5 

25 

125 

625 

3125 

15625 

5 

828,  is* 

775,  is  3* 

809,  is  3* 

537,  is  3* 

522,  is  3 

527.  isa* 

25 

600,  is  g 

549,  is  p  2 

383,  isp 3 

310.  iSpg 

296,  is  p  3 

301.  ts?  3 

125 

384,  is  f  g 

331,  is  p  2 

321,  is  p  2 

248.  is  Pa 

234,  is  p  3 

239,  is  p  3 

625 

311,  is  p  2 

258,  isp  2 

248,  isp 2 

245.  is 2*  3 

231,  is^g 

236,  is^  3 

3125 

296,  is  p  2 

243,  is  p  2 

234,  is  p  2 

230,  is  2. 3 

216,  is  2^  3 

221.  is  2. 3 

15625 

300,  is  p  2 

247,  isp 2 

238,  is  p  2 

234,  is  2. 3 

221,  is  1,3 

225,  isj,3 

structure  abbreviation:  y  -  indexed  sequential  file 

cluster  keyy4c,  attributes  i4x  andi4y  are  inverted 

Table  6.12.  Optimal  Inverted  File  Designs 
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Tables.  13.  Performance  of  an  Indexed  Sequential  Table  6. 14.  Performance  of  an  Indexed  Sequential  Tabled.  15.  Performance  of  an  Indexed  Sequential 
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are  negligibly  different  than  the  costs  of  the  optimal  designs  in  Table  6.12.  In 
such  cases,  these  designs  may  also  be  considered  optimal. 

Provided  that  queries  qualify  few  records  (72^625,  and  73^625)  and  are 
processed  by  inverted  lists,  all  data  file  implementations  exhibit  essentially  the 
same  performance  (ie.,  within  11%).  However,  if  queries  qualify  many  records 
(72<625  or  73<625),  structure  selection  defaults  to  indexed  sequential  designs 
and  cluster  key  selection  becomes  critical.  Hash  based  and  unordered  files  per¬ 
form  poorly  in  comparison. 

The  cluster  key  selections  of  Table  6.12  can  be  understood  in  the  following 
way.  Query  processing  accounts  for  the  major  portion  of  the  file’s  usage  cost. 
Clustering  on  an  attribute  reduces  the  cost  of  processing  queries  involving  that 
attribute;  processing  costs  of  other  queries  remain  essentially  unchanged. 
Therefore,  the  data  file  is  clustered  on  that  attribute  which  contributes  most  to 
lowering  the  file’s  usage  cost. 

Clustering  on^l  i  does  not  reduce  the  cost  of  a  GET  j  since  a  single  record  is 
retrieved.  However,  it  does  reduce  the  cost  of  processing  a  GETz  which  returns 
4000  records  sorted  on  ^4  j  (Le.,  72=5),  This  means  that  if  a  large  fraction  of  a 
file  is  to  be  output  in  sorted  order,  it  is  best  to  store  the  records  of  the  file  in 
that  order.  It  also  means  that  clustering  on  a  sorting  attribute  can  be  better 
than  clustering  on  a  qualifying  attribute. 

Reduction  of  storeige  costs  becomes  important  when  query  processing  costs 
are  affected  marginally  by  clustering.  is  the  recommended  cluster  key  in 
such  cases  (ie.,  72^625  and  73^625). 

Clustering  onAz  reduces  the  cost  of  processing  a  GETz-  TH®  same  holds  for 
Az  and  GET^-  In  general,  the  query  that  qualifies  the  most  records  benefits  the 
most  from  clustering.  It  follows  that  j42  is  the  cluster  key  whenever  the  number 
of  records  output  by  a  GETz  exceeds  that  of  GET z  (ie.,  72<73);  otherwise  ils  is 
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the  cluster  key 

Another  interesting  result  concerns  index  selection  and  hash  based  data 
files.  Recall  that  a  scan  accesses  all  records  of  a  node  before  accessing  records 
of  other  nodes.  As  a  consequence,  blocks  containing  overflow  records  of 
different  nodes  are  accessed  redundantly.  Recall  that  no  block  is  accessed 
more  than  once  when  records  are  retrieved  via  inverted  lists.  Therefore,  dupli¬ 
cate  accesses  can  be  avoided  by  always  processing  queries  using  inverted  lists. 
This  explains  why  more  attributes  are  inverted  for  hash  based  files  than  for 
unordered  files  (cf.,  entries  ¥2=  Vq  =  5  in  Tables  6.16-17).  These  observations 
do  not  apply  to  indexed  sequential  data  files  since  linksets  are  cellular  serial 
structures  rather  than  inverted  lists. 

6.3.5  Remarks 

Computation  experiments  were  conducted  to  determine  efficient  implemen¬ 
tations  for  a  hypothetical  file.  Although  only  three  types  of  queries  were  con¬ 
sidered,  it  is  believed  that  the  results  of  our  studies  are  indicative  of  results 
obtained  when  many  queries  are  considered.  Our  results  indicate  that  selecting 
a  storage  structure  for  a  collection  of  records  is  not  critical  to  good  perfor¬ 
mance  if  einticipated  queries  qualify  few  records  and  are  processed  by  inverted 
lists.  If  queries  qualify  many  records,  indexed  sequential  designs  are  preferred 
and  clustering  records  becomes  important. 

These  results  are  preliminary.  The  file  used  in  the  computation  experi¬ 
ments  was  relatively  static,  i.e.,  there  were  few  insertions,  deletions,  and 
updates.  This  work  may  be  extended  by  examining  implementations  of  highly 
dynamic  files  and  including  the  possibility  of  file  reorganizations. 

This  simple  explanation  is  due  to  the  equal  execution  frequencies  of  GET 2  and  GET 3. 
Clustering  on  A  2  and  A  3  would  be  eisaessed  differently  if  frequencies  were  unequal. 


179 


CHAPTER?.  SUMMARY  AND  CONCLUSIONS 


A  framework  for  a  unifying  model  of  physiced  databases  has  been  presented- 
It  was  shown  that  a  wide  variety  of  interesting  problems  can  be  studied  and  a 
rich  collection  of  works  can  be  synthesized  by  the  proposed  modeL  Inherent  to 
a  unifying  model  is  the  ability  to  generalize  and  extend  previous  work,  aind  this 
has  been  done.  Another  advantage  of  the  approach  taken  here  is  that  analytic 
formulations  of  problems  can  be  developed  quickly  using  a  compact  notation.  It 
is  believed  that  this  framework  provides  a  useful  means  for  unifying  the  study  of 
database  performance. 

7.1  Contributions  of  the  Thesis 

Perhaps  the  most  significant  contribution  of  this  thesis  is  a  unification  of 
many  formerly  disparate  works.  This  was  accomplished  with  the  introduction  of 
new  modeling  techniques  and  refinements  of  previously  known  techniques. 

Decomposition  was  recognized  as  an  important  tool  for  describing  networks 
of  interconnected  ■  files.  Beforehand,  networks  were  modeled  in  am  ad  hoc 
fashion.  For  exeimple,  it  was  common  to  treat  index  files  in  a  manner  quite 
different  from  that  of  other  files  (see  [Yao77],  [Yao79]),  Consequently,  opera¬ 
tions  on  index  files  were  duplicated  as  operations  on  nonindex  files  and  network 
description  itself  became  imnecessarily  complex.  Decomposition  is  a  simple 
way  of  avoiding  these  problems. 

Expected  flow  graphs  were  introduced  as  a  technique  for  modeling  the  per¬ 
formance  evolution  of  files.  Applications  of  the  technique  showed  that  formerly 
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unrelated  results  could  be  derived  from  a  common  set  of  assumptions  and  that 
these  results  could  be  generalized.  Moreover,  it  was  discovered  that  certain 
statistics  for  large  files  could  be  approximated  accurately  by  corresponding 
statistics  of  small  files.  This  result  is  important  because  the  cost  of  simulating 
or  performing  theoretical  calculations  on  large  files  is  prohibitive.  Prior  to  our 
work,  most  analyses  of  files  assumed  that  insertions  and  deletions  affected  per¬ 
formance  minimally  or  they  ignored  the  effects  of  performance  evolution.  Now 
that  methods  for  predicting  evolution  are  available,  more  realistic  problems  can 
be  examined. 

In  connection  with  file  evolution,  a  new  method  was  proposed  for  solving  the 
problem  of  file  reorganization.  Earlier  methods  were  based  on  either  heuristics 
or  restrictive  assumptions.  Our  method  was  based  on  neither.  When  combined 
with  file  evolution,  solutions  to  previously  unsolved  problems  can  be  obtained. 
This  was  done  in  the  case  of  determining  optimal  file  designs  and  reorganization 
points  for  hash  based  and  indexed  sequential  files. 

Transactions  were  modeled  as  sequences  of  statements  involving  database 
operations.  This  in  itself  is  not  new,  for  transactions  alw'ays  have  been 
expressed  in  this  manner.  However,  it  Is  new  in  the  context  of  relating  transac¬ 
tions  to  expressions  that  estimate  their  processing  cost.  The  notation  used  to 
describe  transactions  was  based  on  Pigin  ALGOL  so  that  1)  transactions  could  be 
modeled  in  a  straightforward  manner  and  2)  models  of  transactions  could  be 
easily  understood.  The  only  previous  work  that  modeled  transactions  ([Yao79]) 
employed  a  graphical  notation  which  was  difficult  to  use  and  understand.  It  is 
also  worth  noting  that  the  notation  introduced  in  [Yao79]  was  concerned  solely 
with  the  modeling  of  query  processing  strategies.  Our  work  goes  beyond  this  to 
include  the  modeling  of  transactions  that  modify  databases. 
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7.2  Further  Work 


There  are  a  number  of  important  ways  in  which  this  work  can  be  extended. 
The  simple  file  model  deals  exclusively  with  single  keyed  structures,  i.e.,  struc¬ 
tures  based  on  a  single  cluster  key.  There  are,  however,  multikeyed  structures 
such  as  multikey  hash  based  [RoLo74]  and  multikey  indexed  aggregate  [LiYa78]. 
Future  simple  file  models  should  include  these  structures. 

The  linkset  model  deals  with  structures  that  connect  parent  and  child 
records  via  pointers.  Another  important  way,  common  in  IMS  databases,  is  to 
cluster  child  records  with  their  parent  records  (see  [Schk77]).  The  inclusion  of 
clustering  as  a  linking  method  would  be  a  important  addition  to  future  linkset 
models. 

The  file  evolution  model  may  be  extended  in  two  ways.  First,  an  analysis  of 
dynamic  hash  based  files  was  not  considered  in  this  thesis.  Second,  much  work 
is  needed  to  improve  the  efficiency  of  algorithms  that  generate  file  design 
tables.  Presently,  these  algorithms  are  unacceptably  slow. 

For  the  transaction  model,  transactions  that  sequentially  read  two  or  more 
files  simultaneously  cannot  be  described  easily.  Transactions  that  merge  files 
and  search  transposed  files  are  examples.  A  notation  for  modeling  simultaneous 
retrieval  from  multiple  files  is  needed. 

Lohman  and  Muckstadt  ([LoMu77])  showed  that  the  problem  of  file  reorgani¬ 
zation  was  related  to  problems  concerning  backup,  checkpointing,  and  batch 

\ 

updating.  Future  work  should  assess  the  applicability  of  dynamic  programming 
to  these  other  problems. 

Finally,  some  importeint  work  was  only  indirectly  addressed  in  this  thesis. 
This  work  concerns  the  translation  of  high  level  queries  into  sequences  of  data¬ 
base  operations.  Wong  and  Youssefi  ([WoYo76])  demonstrated  that  query 
decomposition  -  separating  a  query  into  its  single  file  constituents  -  is  an 
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important  technique  in  processing  multifile  queries.  It  is  believed  that  query 
decomposition  and  our  notion  of  database  decomposition  are  strongly  related. 
Further  research  is  this  area  is  needed. 


183 


REFERENCES 


[AHU74]  Aho,  A.V.,  Hopcroft,  J.E.,  and  Ullman,  J.D.  The  Design  and  Analysis  of 
Computer  Algorithms,  Addison-Wesley,  Reading,  Mass.,  1974. 

[AnBe77]  Anderson,  H.D.  and  Berra,  P.B.  "Minimum  Cost  Selection  of  Secondary 
Indices  for  Formatted  Files",  ACM  Trans.  Database  Syst.  2,1  (March 
1977),  pp.  68-90. 

[Astr76]  Astrahan,  M.M.,  et  al  "System  R:  Relational  Approach  to  Database 
Management",  ACM  Trans.  Database  Syst.  1,2  (June  1976),  pp.  97- 
137. 

[BaMc72]  Bayer,  R.  and  McCreight,  E.M.  "Orgeinization  and  Maintenance  of  Large 
Ordered  Indices",  Acta  Informatica  1,3  (1972),  pp.  173-189. 

[Bat78]  Batory,  D.S.  "On  the  Complexity  of  the  Index  Selection  Problem",  unpub¬ 
lished  manuscript,  1978. 

[Bat79]  Batory,  D.S.  "On  Searching  Transposed  Files",  ACM  Trans.  Database  Syst. 
4,4  (Dec.  1979),  pp.  531-544. 

[BlEs77]  Blasgen,  M.W.  and  Eswaran,  K.P.  "Storage  and  Access  in  Relational 
Databases",  IBM  Syst.  Jour.  16,4  (1977),  pp.  363-377. 

[ChrBl]  Christodoulakis,  S.,  Ph.D.  Thesis,  Dept,  of  Computer  Science,  University 
of  Toronto.  To  appear. 

[ClYa78]  Claybrook,  B.G.  and  Yang,  C-S.  "Efficient  Algorithms  for  Answering 
Queries  With  Unsorted  Multilists",  Infor.  Syst.  3  (1978),  pp.  93-97. 

[Codd72]  Codd,  E.F.  "Relational  Completeness  of  Data  Base  Sublanguages",  in 
Data  Base  Systems,  (R  Rustin,  ed.),  Prentice  Hall,  Englewood  Cliffs, 
N.J.,  pp.  65-98. 

[Date77]  Date,  C.L  An  Introduction  to  Database  Systems,  Addison-Wesley,  Read¬ 
ing,  Mass.,  1977. 

[DST75]  Deutscher,  R.F,  Sorenson,  P.G.,  and  Tremblay,  J.P.  "Distribution  Depen¬ 
dent  Hashing  Functions  and  Their  Characteristics",  Proc.  ACM  SIG- 
MOD  1975,  pp.  224-236. 

[FNPS79]  Fagin,  R.,  Nievergelt,  J.,  Pippenger,  N.,  and  Strong,  H.R.  "Extendible 
Hashing  -  A  Fast  Access  Method  for  Dynamic  FUes",  ACM  Trans. 
Database  Syst.  4,3  (Sept.  1979),  pp.  315-344. 


184 


[Haer7B]  Haerder,  T.  "Implementing  a  Generalized  Access  Path  Structure  for  a 
Relational  Database  System",  ACM  Trans.  Database  Syst.  3,3  (Sept. 
1978),  pp.  285-298. 

[Hoff75]  Hoffer,  J.A.  "A  Clustering  Approach  to  the  Generation  of  Subfiles  for  the 
Design  of  a  Computer  Database".  Ph.D.  Th.,  Cornell  U.,  Ithaca.  N.Y.. 
1975. 

[HsHa70]  Hsiao,  D.  and  Harary,  F.  "A  Formal  System  for  Information  Retrieval 
from  Files",  Comm.  ACM  13,2  (Feb.  1970),  pp. 67-73. 

[IBM76]  Introduction  to  IBM  Direct-Access  Storage  Deuices  and  Organization 
Methods,  IBM  Corp.,  White  Plains,  N.Y.,  1976. 

[JakSO]  Jakobsson,  M.  "Reducing  Block  Accesses  in  Inverted  Files  by  Partial 
Clustering",  Infor.  Syst.  5  (1980),  pp.  1-5. 

[KeLa74]  Keehn,  D.G.  and  Lacy,  J.O.  "VSAM  Data  Set  Design  Parameters",  IBM 
Syst.  Jour.  13,3  (1974),  pp.  186-212. 

[King74]  King,  W.F.  "On  the  Selection  of  Indices  for  a  File",  Rep.  RJ1341,  IBM,  San 
Jose,  Calif.,  1974. 

[Klei75]  Kleinrock,  L.  Queueing  Systems,  Vol.  1 :  Theory,  Wiley-Interscience,  New 
York.  N.Y..  1975. 

[KSY78]  Kashyap,  R.L.,  Subas,  S.K.C.,  and  Yao,  S.B.  "Analysis  of  the  Multiple 
Attribute  Tree  Data  Base  Organization",  IEEE  Trans.  Software 
Engineering,  Vol.  Se-3  #6  (Nov.  1977),  pp.  451-467. 

[Lar7B]  Larson,  P.  "Dynamic  Hashing",  BIT  18  (1978),  pp.  184-201. 

[Lit78]  Litwin,  W.  "Virtual  Hashing:  A  Dynamically  Changing  Hashing",  Proc.  Very 
Large  Data  Bases  Conf.,  Berlin  1978,  pp.  517-523. 

[LiYa78]  Liou,  J.H.  and  Yao,  S.B.  "Multidimensional  Clustering  for  Database 
Organizations",  Infor.  Syst.  2  (1978),  pp.  187-198. 

[LoMu77]  Lohman,  G.M.  and  Muckstadt,  J.A.  "Optimal  Policy  for  Batch  Opera¬ 
tions;  Backup,  Checkpointing,  Reorganization,  and  Updating",  ACM 
Trans.  Database  Syst.  2,3  (Sept.  1977),  pp.  209-222. 

[MaSe77]  March,  S.T.  and  Severance,  D.G.  "The  Determination  of  Efficient 
Record  Segmentations  and  Blocking  Factors  for  Shared  Data  Files", 
Trans.  Database  Syst.  2,3  (Sept.  1977),  pp.  279-296. 

[NaMi78]  Nakamura,  T.  and  Mizoguchi,  T.  "An  Analysis  of  Storage  Utilization  Fac¬ 
tor  in  Block  Split  Data  Structuring  Scheme",  Proc.  Very  Large  Data 
Bases  Conf.,  Berlin  1978,  pp.  489-495. 

[Nia78]  Niamir,  B.  "Attribute  Partitioning  in  a  Self-Adaptive  Relational  Database 
System",  Rep.  MIT/LCS/TR-192,  M.Sc.  Th.,  M.I.T.,  Cambridge,  Mass., 
1978. 


185 


[ReutiSO]  Ramirez,  R.J.  "Efficient  Algorithms  for  Selecting  Efficient  Data  Storage 
Structures",  Ph.D.  Th.,  U.  of  Waterloo,  Waterloo,  Ont.,  1980. 

[RiTh74]  Ritchie,  D.M.  and  Thompson,  F.  "The  UNIX  Time-Sharing  System", 
Comm.  ACM  17,7  (July  1974),  pp.  365-375. 

[RoLo74]  Rothnie,  J.B.  and  Lozano,  T.  "Attribute  Based  File  Organization  in  a 
Paged  Memory  Environment",  Comm.  ACM  17,2  (Feb.  1974),  pp.  63- 
69. 

[Schk75]  Schkolnick,  M.  "The  Optimal  Selection  of  Secondary  Indices  for  Files", 
Infor.  Syst.  1  (1975),  pp.  141-146. 

[Schk77]  Schkolnick,  M.  "A  Clustering  Algorithm  for  Hierarchical  Structures", 
ACM  Trans.  Database  Syst.  2,1  (March  1977),  pp.  27-44. 

[SeDu76]  Severance.  D.G.  and  Duhne,  R.  "A  Practitioner’s  Guide  to  Addressing 
Algorithms",  Comm.  ACM  19,6  (June  1976),  pp.  314-326. 

[SeLo76]  Severance,  D.G.  and  Lohman,  G.M.  "Differential  Files:  Their  Application 
to  the  Maintenance  of  Large  Databases",  ACM  Trans.  Database  Syst. 
1,3  (Sept.  1976),  pp.  256-267. 

[Scv72]  Severance,  D.G.  "Some  Generalized  Modeling  Structures  for  use  in 
Design  of  File  Organizations",  Ph.D.  Th.,  U.  of  Michigan,  Ann  Arbor, 
Mich.,  1972. 

[ShGo76]  Shneiderman,  B.  and  Goodman,  V.  "Batched  Searching  of  Sequential 
and  Tree  Structured  Files",  ACM  Trans.  Database  Syst.  1,3  (Sept. 
1976),  pp.  268-275. 

[Shn73]  Shneiderman,  B.  "Optimum  Data  Base  Reorganization  Points",  Comm. 
ACM  16,6  (June  1973),  pp.  362-365. 

[TsLo77]  Tsichritzis,  D.C.  and  Lochovsky,  F.H.,  DataBase  Management  Systems. 
Academic  Press,  New  York,  1977. 

[Tuel7B]  Tuel,  W.G.  "Optimum  Reorganization  Points  for  Linearly  Growing  Files", 
ACM  Trans.  Database  Syst.  3,1  (March  1978),  pp.  32-40. 

[Van73]  Van  der  Pool,  J.A.  "Optimum  Storage  Allocation  for  a  File  in  Steady 
State",  IBM  Jour.  Res.  Dev.,  Jan.  1973,  pp.  27-38. 

[WoYo76]  Wong,  E.  and  Youseffi,  K.  "Decomposition  -  A  Strategy  for  Query  Pro¬ 
cessing",  Trans.  Database  Syst.  1,3  (Sept.  1976),  pp.  223-241. 

[Yao  74]  Yao,  S.B.  "Evaluation  and  Optimization  of  File  Organizations  Through 
Analytic  Modeling",  Ph.D.  Th.,  U.  of  Michigan,  Ann  Arbor,  Mich., 
1974. 

[Yao77]  Yao,  S.B.  "An  Attribute  Based  Model  for  Database  Access  Cost  Analysis", 
ACM  Trans.  Database  Syst.  2,1  (March  1977),  pp.  45-67. 

[Yao79]  Yao,  S.B.  "Optimization  of  Query  Evaluation  Algorithms",  ACM  Trans. 


186 


Database  Syst.  4,2  (June  1979),  pp.  133-155. 

[YDT76]  Yao,  S.B.,  Das,  K.S.,  and  Teorey,  T.J.  "A  D5mamic  Database  Reorganiza¬ 
tion  Algorithm",  ACM  Trans.  Database  Syst.  1,2  (June  1976),  pp. 
159-174. 

[YuWo75]  Yue,  P.C.  and  Wong,  C.K.  "Storage  Cost  Considerations  in  Secondary 
Index  Selection",  Int.  Jour.  Computer  and  Information  Sci.,  Vol.  4#4 
(1975),  pp.  307-327. 


187 


APPENDIX  I.  MODIFIED  RETRIEVAL  COST  FUNCTION 


A  modified  scan  accesses  all  nodes  of  a  simple  file.  Its  execution  cost  is 
approximated  by  MSC: 

L-\ 

MSC  =  S  NSC{Z^,i) 

<>o 

A  modified  pau’tial  scan  accesses  the  same  base  file  nodes  as  a  partial  scan. 
It  also  accesses  all  ancestors  of  these  nodes.  Its  execution  cost  is  approximated 
by  MPS: 

MPS{r)  = 

<=o 


A  modified  range  search  accesses  the  same  base  file  nodes  as  a  range 
search.  It  also  accesses  all  ancestors  of  these  nodes.  Let  Cn(r,i)  be  the  number 
of  consecutive  nodes  on  level  i  that  are  accessed  in  a  modified  range  search  for  r 
records.  Cn(r,i)  is  estimated  by:  ^ 


Cn(r,i) 


A'(r.r.i-l) 

Hi  +  Gi 


The  execution  cost  of  a  modified  range  search  is  approximated  by  MRS: 


L-Z 

MRS{r)  =  Y,NSC{Cn{r.i).i) 

i=0 


^  Note  that  Cn(r,i)  is  a  generalization  of  Cn(r)  of  Section  2.3.2. 


,  Zr  _i  +  Cn  (r,L  —  1)  . 

NSCi-J^ - g - .L-1) 


NSC(,Cn{T,L-l).L-l) 


if  Rindex—0 
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Concluding,  the  modified  retrieval  cost  function  is: 


MRET{Y.q)  = 


MSC 

USs 

MPSiefxN) 

if  5s 

MRslkfxN) 

if  5s 

KS  (kf  xNk.  min{ef,  kf  )xN ) 

if  5s 

scan 

partial  scan 
range  search 
cluster  key  search 
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APPENDIX  n.  FILE  DESIGN  TABLES 


The  following  pages  contain  some  file  design  tables  for  hash  based,  indexed 

aggregate,  indexed  sequential,  and  B+  tree  files.  Tables  are  selected  according 
to  three  values  that  characterize  the  structure’s  initial  configuration.  These 
values  are:  1)  R,  the  record  capacity  of  a  primary  block:  2)  D/I,  the  relative 
intensity  of  deletions  to  insertions.  D/I=l  models  a  steady  state  or  zero  net 
growth  file;  D/1=0  models  a  file  that  experiences  no  deletions.  3)  Nq/Z  is  the 
initial  number  of  records  per  node. 

Row  0  of  each  table  lists  initial  values  for  selected  file  parameters.  Subse¬ 
quent  rows  list  the  values  of  these  parameters  after,  on  the  average,  a  single 
record  has  been  inserted  into  each  node  of  the  file.  That  is,  the  average  number 
of  insertions  per  node  is  indexed  by  row  numbers.  The  number  of  deletions  per 
node  which  have  occurred  concurrently  with  each  insertion  is  governed  by  D/I. 
For  example,  if  D/I=.5,  row  7  lists  pfiirameter  values  after  an  averaige  of  7  inser¬ 
tions  Euid  3.5  deletions  have  occurred  per  node.  For  files  where  overflow  occurs, 
the  value  of  G  for  row  j  is  obtained  using: 

G  =  (expected  number  of  records  per  node)  -  H 
=  {Nq/Z  +  (l-D/7)xy)-H 

where  Nq/Z  and  D/'T  are  used  to  identify  the  design  table,  and  the  value  of  H  is 
taken  from  row  j  of  the  table.  For  other  details,  see  Section  3.3.3. 
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HASH  BASED  FILE  DESIGN  TABLES 


row 

H 

Pou 

Pfull 

n 

row 

H 

Pou 

Pfull 

n 

0 

2.B8 

O.OB 

O.IB 

0.06 

0 

4.15 

0.38 

0.56 

0.34 

1 

3.62 

0.21 

0.37 

0.17 

1 

4.51 

0.56 

0.72 

0.58 

2 

4.15 

0.38 

0.56 

0.34 

2 

4.73 

0.71 

0.84 

0.88 

3 

4.51 

0.56 

0.72 

0.58 

3 

4.86 

0.82 

0.91 

1.21 

4 

4.73 

0.71 

0.84 

0.88 

4 

4.93 

0.89 

0.95 

1.58 

5 

4.86 

0.82 

0.91 

1.21 

5 

4.96 

0.94 

0.97 

1.97 

6 

4.93 

0.89 

0.95 

1.58 

6 

4.98 

0.97 

0.99 

2.38 

7 

4.96 

0.94 

0.97 

1.97 

7 

4.99 

0.98 

0.99 

2.80 

8 

4.98 

0.97 

0.99 

2.38 

8 

5.00 

0.99 

1.00 

3.24 

Table  II.  1.  R  =  5  D/1  =  0.0  Nq/Z  =  3.0  Table  11.4.  R  =  5  D/I  =  0.0  Nq/Z  =  5.0 


row 

H 

Pou 

Pfull 

n 

0 

2. 88 

0.08 

0.18 

0.08 

1 

3.21 

0.17 

0.22 

0.13 

2 

3.51 

0.27 

0.29 

0.21 

3 

3.75 

0.37 

0.36 

0.31 

4 

3.96 

0.47 

0.42 

0.42 

5 

4.12 

0.57 

0.46 

0.55 

6 

4.25 

0.66 

0.53 

0.69 

7 

4.35 

0.73 

0.57 

0.83 

8 

4.43 

0.79 

0.61 

0.99 

row 

H 

Pou 

Pfull 

n 

0 

4.15 

0.38 

0.56 

0.34 

1 

4.21 

0.52 

0.53 

0.52 

2 

4.30 

0.63 

0.55 

0.67 

3 

4.38 

0.71 

0.58 

0.83 

4 

4.45 

0.78 

0.61 

0.99 

5 

4.51 

0.84 

0.64 

1.16 

6 

4.55 

0.88 

0.67 

1.34 

7 

4.59 

0.91 

0.69 

1.52 

8 

4.63 

0.94 

0.71 

1.71 

Table  II. 2.  R  =  5  D/I  =  0.5  Nq/Z  =  3.0 


Table  II. 5.  R  =  5  D/I  =  0.5  Nq/Z  =  5.0 


row 

H 

Pou 

Pfull 

n 

1 

row 

H 

Pou 

Pfull 

n 

0 

2.88 

0.08 

0.18 

0.06 

0 

4.15 

0.38 

0.56 

0.34 

1 

2.80 

0.13 

0.12 

0.10 

1 

3.90 

0.49 

0.3B 

0.45 

2 

2.75 

0.16 

0.11 

0.12 

2 

3.78 

0.54 

0.33 

0.50 

3 

2.73 

0.18 

0.11 

0.13 

3 

3.72 

0.58 

0.31 

0.52 

4 

2.71 

0.20 

0.11 

0.14 

4 

3.67 

0.60 

0.30 

0.54 

5 

2.70 

0.20 

0.11 

0.14 

5 

3.65 

0.62 

0.29 

0.54 

6 

2.70 

0.21 

0.10 

0.14 

6 

3.63 

0.63 

0.29 

0.55 

7 

2.69 

0.21 

0.10 

0.14 

7 

3.62 

0.64 

0.29 

0.55 

8 

2.69 

0.22 

0.10 

0.14 

8 

3.61 

0.64 

0.28 

0.55 

Table  II.3.  R  =  5  D/I  =  1.0  Nq/Z  =  3.0  Table  II.6.  R  =  5  D/I  =  1.0  Nq/Z  =  5.0 
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HASH  BASED  FU^  DESIGN  TABLES 


row 

H 

Pou 

Pfull 

n 

0 

18.00 

0.00 

0.00 

0.00 

1 

18.99 

0.00 

0.01 

0.00 

2 

19.98 

0.01 

0.02 

0.00 

3 

20.95 

0.02 

0.03 

0.00 

4 

21.91 

0.04 

0.05 

0.01 

5 

22.84 

0.06 

0.09 

0.02 

6 

23.74 

0.09 

0.13 

0.03 

7 

24.59 

0.13 

0.17 

0.05 

8 

25.39 

0.18 

0.23 

0.07 

Table  II.  7.  R  =  30  D/I  =  0.0  Nq/Z  =  18.0 


row 

H 

Pou 

Pfull 

n 

0 

18.00 

0.00 

0.00 

0.00 

1 

18.49 

0.00 

0.00 

0.00 

2 

18.99 

0.01 

0.01 

0.00 

3 

19.48 

0.01 

0.01 

0.00 

4 

19.97 

0.01 

0.01 

0.00 

5 

20.45 

0.02 

0.02 

0.00 

6 

20.93 

0.03 

0.02 

0.01 

7 

21.41 

0.04 

0.03 

0.01 

8 

21.88 

0.05 

0.04 

0.01 

row 

H 

Pou 

Pfull 

n 

0 

27.89 

0.45 

0.53 

0.28 

1 

28.33 

0.53 

0.60 

0.37 

2 

28.70 

0.60 

0.67 

0.47 

3 

29.00 

0.67 

0.73 

0.58 

4 

29.24 

0.73 

0.79 

0.72 

5 

29.43 

0.78 

0.83 

0.87 

6 

29.58 

0.83 

0.B7 

1.03 

7 

29.70 

0.87 

0.90 

1.21 

8 

29.79 

0.90 

0.93 

1.41 

Table  II.  10.  R  =  30  D/1  =  0.0  N^Z  =  30.0 


H 

Pou 

Pfull 

n 

0 

27.89 

0.45 

0.53 

0.28 

1 

27.96 

0.51 

0.43 

0.35 

2 

28.08 

0.56 

0.41 

0.41 

3 

28.21 

0.61 

0.41 

0.46 

4 

28.34 

0.65 

0.43 

0.52 

5 

28.46 

0.68 

0.44 

0.59 

6 

28.58 

0.72 

0.45 

0.65 

7 

28.68 

0.75 

0.47 

0.73 

8 

i  28.77 

0.78 

0.48 

0.80 

Table  II.  8.  R  =  30  D/I  =  0.5  Nq/Z  =  18.0 


Table  II.  11.  R  =  30  D/I  =  0.5  N^/Z  =  30.0 


row 

H 

Pou 

Pfull 

n 

row 

H 

Pou 

Pfull 

n 

0 

18.00 

0.00 

0.00 

0.00 

0 

27.89 

0.45 

0.53 

0.28 

1 

17.99 

0.00 

0.00 

0.00 

1 

27.57 

0.50 

0.30 

0.33 

2 

17.99 

0.00 

0.00 

0.00 

2 

27.39 

0.53 

0.24 

0.36 

3 

17.99 

0.01 

0.00 

0.00 

3 

27.25 

0.56 

0.21 

0.38 

4 

17.99 

0.01 

0.00 

0.00 

4 

27.14 

0.58 

0.20 

0.39 

5 

17.99 

0.01 

0.00 

0.00 

5 

27.04 

0.60 

0.19 

0.41 

6 

17.98 

0.01 

0.00 

0.00 

6 

26.96 

0.61 

0.18 

0.42 

7 

17.98 

0.01 

0.00 

0.00 

7 

26.89 

0.62 

0.17 

0.43 

8 

17.98 

0.01 

0.00 

0.00 

8 

26.83 

0.64 

0.17 

0.44 

Table  II.9.  R  =  30  D/I  =  1.0  Nq/Z  =  18.0  Table  11.12.  R  =  30  D/I  =  1.0  N^/Z  =  30.0 
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INDEXED  AGGREGATE  FILE  DESIGN  TABLES 


row 

H 

Pou 

Pfull 

n 

0 

3.00 

0.00 

0.00 

0.00 

1 

3.B6 

0.10 

0.37 

0.05 

2 

4.33 

0.32 

0.67 

0.26 

3 

4.58 

0.51 

0.82 

0.61 

4 

4.72 

0.64 

0.89 

1.03 

5 

4.B1 

0.74 

0.94 

1.51 

6 

4. 86 

0.80 

0.96 

2.03 

7 

4.90 

0.85 

0.97 

2.57 

B 

4.92 

0.88 

0.98 

3.13 

r  " " 

row 

H 

Pou 

Pfull 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

5.00 

0.61 

1.00 

0.26 

2 

5.00 

0.82 

1.00 

0.61 

3 

5.00 

0.91 

1.00 

1.03 

4 

5.00 

0.95 

1.00 

1.48 

5 

5.00 

0.97 

1.00 

1.96 

6 

5.00 

0.98 

1.00 

2.46 

7 

5.00 

0.99 

1.00 

2.98 

8 

5.00 

0.99 

1.00 

3.51 

Table  11.13.  R  =  5  D/I  =  0.0  Nq/Z  =  3.0 


Table  11.16.  R  =  5  D/I  =  0.0  N q/Z  =  5.0 


row 

H 

Pou 

Pfull 

n 

0 

3.00 

0.00 

0.00 

0.00 

1 

3.40 

0.07 

0.23 

0.04 

2 

3.61 

0.21 

0.38 

0.17 

3 

3.75 

0.33 

0.46 

0.36 

4 

3. 86 

0.43 

0.51 

0.58 

5 

3.94 

0.50 

0.55 

0.82 

6 

4.02 

0.56 

0.59 

1.07 

7 

4.09 

0.61 

0.62 

1.33 

8 

4.15 

0.66 

0.64 

1.59 

.r.pw .. 

Pou 

Pfull 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

4.69 

0.53 

0.74 

0.22 

2 

4.57 

0.69 

0.69 

0.46 

3 

4.52 

0.76 

0.67 

0.71 

4 

4.51 

0.81 

0.68 

0.96 

5 

4.51 

0.84 

0.69 

1.22 

6 

4.52 

0.87 

0.70 

1.48 

7 

4.53 

0.89 

0.71 

1.74 

8 

4.55 

0.90 

0.72 

2.01 

Table  11.14.  R  =  5  D/I  =  0.5  Nq/Z  =  3.0 


Table  11.17.  R  =  5  D/I  =  0.5  Nq/Z  =  5.0 


row 

H 

Pou 

Pfull 

n 

0 

3.00 

0.00 

0.00 

0.00 

1 

2.94 

0.05 

0.14 

0.03 

2 

2.81 

0.12 

0.18 

0.10 

3 

2.71 

0.17 

0.19 

0.17 

4 

2,63 

0.20 

0.19 

0.23 

5 

2.58 

0.22 

0.19 

0.27 

6 

2.54 

0.23 

0.19 

0.30 

7 

2.51 

0.24 

0.19 

0.33 

8 

2.49 

0.25 

0.19 

0.34 

row 

H 

Pou 

Pfull 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

4.36 

0.44 

0.54 

0.18 

2 

4.04 

0.53 

0.43 

0.33 

3 

3.85 

0.57 

0.39 

0.44 

4 

3.72 

0.59 

0.37 

0.54 

5 

3.62 

0.59 

0.35 

0.61 

6 

3.56 

0.60 

0.34 

0.67 

7 

3.51 

0.60 

0.34 

0.71 

8 

3.47 

0.60 

0.34 

0.75 

Table  11.15.  R  =  5  D/I  =  1.0  N ^Z  =  3.0  Table  II.  16.  R  =  5  D/I  =  1.0  N^Z  =  5.0 
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INDEXED  SEQUENTIAL  FILE  DESIGN  TABLES 


row 

H 

Pou 

Pfull 

n 

row 

— 

H 

Pou 

Pfull 

n 

0 

3.00 

0.00 

0.00 

0.00 

0 

4.00 

0.00 

0.00 

0.00 

1 

3.86 

0.10 

0.37 

0.05 

1 

4.60 

0.26 

0.68 

0.12 

2 

4.33 

0.32 

0.67 

0.26 

2 

4.81 

0.55 

0.88 

0.40 

3 

4.58 

0.51 

0.82 

0.61 

3 

4.90 

0.72 

0.94 

0.77 

4 

4.72 

0.64 

0.89 

1.03 

4 

4.94 

0.82 

0.97 

1.21 

5 

4.81 

0.74 

0.94 

1.51 

5 

4.96 

0.88 

0.98 

1.69 

6 

4.86 

0.80 

0.96 

2.03 

6 

4.98 

0.92 

0.99 

2.19 

7 

4.90 

0.85 

0.97 

2.57 

7 

4.98 

0.94 

0.99 

2.72 

8 

4.92 

0.88 

0.98 

3.13 

8 

4.99 

0.96 

1.00 

3.26 

Table  11.19.  R  =  5  D/I  =  0.0  N q/Z  =  3.0  Table  11.22.  R  =  5  D/I  =  0.0  =  4.0 


row 

H 

Pou 

Pfull 

Q 

0 

3.00 

0.00 

0.00 

0.00 

1 

3.40 

0.07 

0.23 

0.04 

2 

3.61 

0.21 

0.38 

0.17 

3 

3.74 

0.33 

0.45 

0.37 

4 

3.83 

0.43 

0.49 

0.60 

5 

3.91 

0.50 

0.51 

0.65 

6 

3.97 

0.56 

0.53 

1.12 

7 

4.02 

0.62 

0.55 

1.39 

8 

4.06 

0.66 

0.56 

1.68 

Table  11.20.  R  =  5  D/I  =  0.5  N^/Z  =  3.0 


row 

H 

Pou 

PfuU 

n 

0 

4.00 

0.00 

0.00 

0.00 

1 

4.20 

0.21 

0.47 

0.09 

2 

4.23 

0.40 

0.54 

0.28 

3 

4.24 

0.53 

0.56 

0.51 

4 

4.25 

0.61 

0.57 

0.76 

5 

4.26 

0.6B 

0.58 

1.02 

6 

4.27 

0.73 

0.59 

1.29 

7 

4.29 

0.76 

0.59 

1.57 

8 

4.30 

0.79 

0.59 

1.85 

Table  11.23.  R  =  5  D/I  =  0.5  N q/Z  =  4.0 


row 

H 

Pou 

Pfull 

n 

0 

4.00 

0.00 

0.00 

0.00 

1 

3.79 

0.16 

0.32 

0.07 

2 

3.55 

0.27 

0.30 

0.19 

3 

3.37 

0.33 

0.27 

0.30 

4 

3.23 

0.36 

0.25 

0.39 

5 

3.13 

0.39 

0.24 

0.47 

6 

3.05 

0.40 

0.23 

0.54 

7 

2.99 

0.41 

0.22 

0.59 

8 

2.94 

0.42 

0.21 

0.64 

row 

H 

Pou 

Pfull 

n 

0 

3.00 

0.00 

0.00 

0.00 

1 

2.94 

0.05 

0.14 

0.03 

2 

2.81 

0.12 

0.18 

0.10 

3 

2.70 

0.17 

0.18 

0.18 

4 

2.61 

0.20 

0.17 

0.25 

5 

2.54 

0.22 

0.17 

0.31 

6 

2.49 

0.24 

0.16 

0.36 

7 

2.45 

0.25 

0.15 

0.40 

8 

2.42 

0.26 

0.15 

0.43 

Table  11.21.  R  =  5  D/I  =  1.0  Nq/Z  =  3.0  Table  11.24.  R  =  5  D/I  =  1.0  N q/Z  =  4.0 
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INDEXED  SEQUENTIAL  FILE  DESIGN  TABLES 


row 

H 

Pou 

PfuU 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

5.00 

0.61 

1.00 

0.26 

2 

5.00 

0.82 

1.00 

0.61 

3 

5.00 

0.91 

1.00 

1.03 

4 

5.00 

0.95 

1.00 

1.48 

5 

5.00 

0.97 

1.00 

1.96 

6 

5.00 

0.98 

1.00 

2.46 

7 

5.00 

0.99 

1.00 

2.98 

6 

5.00 

0.99 

1.00 

3.51 

Table  11.25.  R  =  5  D/I  =  0.0  N^Z  =  5.0 


row 

- . H  

Pou 

PfuU 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

4.68 

0.53 

0.74 

0.22 

2 

4.56 

0.69 

0.67 

0.47 

3 

4.49 

0.76 

0.64 

0.73 

4 

4.45 

0.81 

0.63 

1.00 

5 

4.43 

0.85 

0.63 

1.27 

6 

4.42 

0.87 

0.62 

1.55 

7 

4.41 

0.89 

0.62 

1.83 

8 

4.41 

0.90 

0.62 

2.11 

Table  11.26.  R  =  5  D/I  =  0.5  N =  5.0 


row 

H 

Pou 

Pfull 

n 

0 

5.00 

0.00 

1.00 

0.00 

1 

4.35 

0.44 

0.54 

0.19 

2 

4.02 

0.54 

0.42 

0.34 

3 

3.80 

0.57 

0.36 

0.48 

4 

3.64 

0.59 

0.32 

0.59 

5 

3.52 

0.60 

0.30 

0.69 

6 

3.43 

0.60 

0.28 

0.76 

7 

3.36 

0.61 

0.27 

0.83 

8 

3.30 

0.81 

0.26 

0.89 

Table  11.27.  R  =  5  D/I  =  1.0  N^^Z  -  5.0 
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B+  TREE  FILE  DESIGN  TABLES 


row 

K 

Pou 

PfuU 

Pmer 

0 

3.00 

1.00 

0.00 

1.00 

1 

3.64 

0.43 

0.21 

0.52 

2 

3.71 

0.40 

0.27 

0.49 

3 

3.70 

0.40 

0.27 

0.50 

4 

3.70 

0.41 

0.27 

0.50 

5 

3.70 

0.41 

0.27 

0.50 

6 

3.70 

0.41 

0.27 

0.50 

7 

3.70 

0.41 

0.27 

0.50 

8 

3.70 

0.41 

0.27 

0.50 

Table  II.2B.  R  =  5  D/I  =  0.0  N -  3.0 


row 

H 

Pou 

PfuU 

Pmer 

0 

3.00 

1.00 

0.00 

1.00 

1 

3.62 

0.45 

0.23 

0.55 

2 

3.65 

0.43 

0.24 

0.52 

3 

3.66 

0.43 

0.24 

0.52 

4 

3.66 

0.43 

0.24 

0.52 

5 

3.66 

0.43 

0.24 

0.52 

6 

3.66 

0.43 

0.24 

0.52 

7 

3.66 

0.43 

0.24 

0.52 

8 

1  3.66 

0.43 

0.24 

0.52 

Table  11.29.  R  =  5  D/I  =  0.5  AT^/Z  =  3.0 


row 

'  K 

Pou 

PfuU 

r— - ^1 

Pmer  i 

0 

5.00 

1.00 

0.00  j 

1 

3.70 

0.47 

0.37 

0.57  : 

2 

3.64 

0.46 

0.26 

0.55  i 

3 

3.67 

0.43 

0.26 

0.52  ' 

4 

1  3.69 

0.41 

0.26 

0.51  1 

1  5 

3.69 

0.41 

0.27 

0.50  1 

1  6 

3.70 

0.41 

0.27 

0.50  j 

1  7 

3.70 

0.41 

0.27 

0.50  1 

!-  8 

3.70 

0.41 

0.27 

0.50  i 

Table  11.31.  R  =  5  D/I  =  0.0  N q/Z  =  5.0 


row 

H 

Pou 

PfuU 

Pmer  ! 

0 

5.00 

0.00 

1.00 

0.00 

1 

3.71 

0.42 

0.31 

0.52 

2 

3.65 

0.44 

0.25 

0.53 

3 

3.66 

0.43 

0.25 

0.52 

I  4 

3.66 

0.43 

0.24 

0.52 

1  5 

3.66 

0.43 

0.24 

0.52  i 

6 

3.66 

0.43 

0.25 

0.52  1 

7 

3.66 

0.43 

0.25 

0.52  j 

L§1 

3.66 

0.43 

0.25 

0.52  ! 

Table  11.32.  R  =  5  D/I  =  0.5  N ^Z  -  5.0 


row 

H 

Pou 

PfuU 

Pmer 

row 

H 

Pou 

PfuU 

Pmer 

0 

3.00 

1.00 

0.00 

1.00 

0 

5.00 

0.00 

1.00 

0.00 

1 

3.62 

0.46 

0.23 

0.55 

1 

3.69 

0.42 

0.27 

0.51 

2 

3.63 

0.45 

0.23 

0.54 

2 

3.64 

0.44 

0.24 

0.54 

3 

3.63 

0.45 

0.23 

0.54 

3 

3.63 

0.44 

0.23 

0.54 

4 

3.63 

0.45 

0.23 

0.54 

4 

:  3.63 

0.44 

0.23 

0.54 

5 

3.63 

0.45 

0.23 

0.54 

5 

3.63 

0.44 

0.23 

0.54  1 

6 

3.63 

0.45 

0.23 

0.54 

6 

3.63 

0.44 

0.23 

0.54 

7 

3.63 

0.45 

0.23 

0.54 

7 

3.63 

0.44 

0.23 

0.54 

3.63 

0.45 

0.23 

0.54 

8 

3.63 

0.44 

0.23 

0.54 

Table  11.30.  R  =  5  D/I  =  1.0  ^Z  =  3.0  Table  11.33.  R  =  5  D/I  =  1.0  ATq/Z  =  5.0 
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B+  TREE  FILE  DESIGN  TABLES 


I 

row 

K 

Pou 

Pfull 

Pmer 

0 

6.00 

0.00 

0.00 

1.00 

1 

6.96 

0.00 

0.03 

0.39 

2 

7.46 

0.04 

0.11 

0.27 

3 

7.43 

0.08 

0.16 

0.33 

4 

7.26 

0.11 

0.17 

0.40 

5 

7.14 

0.12 

0.16 

0.43 

6 

i  7.09 

0.12 

0.15 

0.44 

7 

7.07 

0.12 

0.14 

0.44 

L— 8- 

7.08 

0.12 

0.14 

0.44 

Table  11.34.  R  =  10  D/I  =  0.0  Nq/Z  =  6.0 


. 

row  1 

L  H 

Pou 

Pfull 

Pmer 

0 

i  6.00 

0.00 

0.00 

1.00 

1 

6.57 

0.12 

0.03 

0.54 

2 

6.90 

0.12 

0.07 

0.44 

3 

6.99 

0.12 

0.09 

0.43 

t  4  1 

7.00 

0.13 

0.10 

0.44 

1  b  i 

1  6.99 

0.13 

0.10 

0.44 

6  1 

6.98 

0.13 

0.10 

0.45 

7 

6.97 

0.13 

0.10 

0.45 

8 

6.96 

0.13 

0.10 

0,45 

Table  11.35.  R  =  10  D/I  =  0.5  N^^/Z  =  6.0 


row 

1  H 

Pou 

PfuU 

Pmer 

0 

6.00 

0.00 

0.00 

1.00 

1 

6.43 

0.20 

0.06 

0.62 

2 

6.65 

0.18 

0.08 

0.54 

3 

6.72 

0.17 

0.08 

0.52 

4 

6.74 

0.17 

0.08 

0.51 

5 

6.75 

0.16 

0.08 

0.51 

6 

6.76 

0.16 

0.08 

0.51 

T 

6.76 

0.16 

0.08 

0.51 

L-8... 

6.76 

0.16 

0.06 

0.51 

Table  11.36.  R  =  10  D/I  =  1.0  Nq/Z  =6.0 


row 

H 

Pou 

Pfull 

Pmer 

0 

1  10.00 

0.00 

1.00 

0.00 

1 

6.76 

0.22 

0.34 

0.65 

2 

6.46 

0.20 

0.14 

0.63 

3 

6.64 

0.16 

0.08 

0.54 

4 

I  6.87 

0.13 

0.09 

0.47 

5 

i  7.03 

0.11 

0.10 

0.42 

6 

1  7.13 

0.10 

0.12 

0.40 

7 

7.17 

0.10 

0.13 

0.40 

8 

7.18 

0.11 

0.14 

0.40 

Table  11.37.  R  =  10  D/I  =  0.0  Nq/Z  =  10.0 


row 

H 

Pou 

....PML 

Pmer. 

0 

10.00 

0.00 

1.00 

0.00 

1 

6.94 

0.20 

0.26 

0.58 

2 

6.69 

0.20 

0.13 

0.58 

3 

6.72 

0.18 

0.10 

0.54 

4  1!  6.79 

0.16 

0.09 

0.51 

5  1 

!  6.85 

0.15 

0.09 

0.49  1 

6 

6.89 

0.14 

0.09 

0.47 

7  1 

6.92 

0.14 

0.10 

0.46 

-8 i 

6.94 

0.14 

0.10 

0.46 

Table  11.38.  R  =  10  D/I  =  0.5  Nq/Z  =  10.0 


row 

H 

Pou 

Pfull 

Pmer 

0 

10.00 

0.00 

1.00 

0.00 

1 

7.05 

0.18 

0.20 

0.51 

2 

6.80 

0.18 

0.11 

0.53 

3 

6.76 

0.17 

0.09 

0.52 

4 

6.76 

0.17 

0.09 

0.52 

5 

6.76 

0.17 

0.08 

0.51 

6 

6.76 

0.17 

0.08 

0.51 

7 

6.76 

0.16 

0.08 

0.51 

8 

6.76 

0.16 

0.08 

0.51 

Table  11.39.  R  =  10  D/I  =  1.0  Nq/Z  =  10.0 
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APPENDIX  m.  SOME  DESCRIPTOR  MACROS 


The  following  parameters  are  used  in  defining  macros  for  file  descriptors: 

RO  -  record  capacity  of  a  (base  file)  primary  block 

Ro  -  record  capacity  of  a  (base  file)  overflow  block 

R1  -  record  capacity  of  a  cluster  index  block 

HO  -  desired  record  occupemcy  of  a  base  file  primary  block 

HI  -  desired  record  occupancy  of  a  cluster  index  primary  block 

Nrec  -  number  of  records  in  file 

S,So  -  primairy  and  overflow  block  storage  cost  per  unit  time 
A.Ao  -  primary  euid  overflow  block  access  cost 

Macros  for  unordered,  hash  based,  interned  *,  indexed  sequential,  and  B+  tree 
file  descriptors  are: 

UNORDERED(RO,Nrec,S,A): 

L _ Ck _ Mectk  Rindex  Split  Ascend  A  Ao  S  So  N 

1  relative_location  0  1  0  OAOSO  Nrec 

level  i  Rj  Roj  Mvj  Hj  n<  PfulU  Poru  PmeVi  Zj 

IZoIOZqOO  0  0  0  1 

0  RO  1  0  //o  0  0  1/RO  0  0  Zo 

^  An  vntemal  file  is  a  file  whose  records  reside  in  main  memory. 
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f 


where  Zq  =  \NTec/R  0  1  and  =  Ntbc/Zq. 


HASHJASED(RO,Ro.HO.Nrec,S.So.A,Ao): 


L 

Ck 

Maxk 

Rindex 

Split 

Ascend 

A 

Ao 

S 

So 

N 

1 

hash 

l/Zo 

1 

0 

0 

A 

Ao 

S 

So 

Nrec 

level  i 

Pi 

Roi 

Mvi 

Pi 

Oi 

Of 

PfulU 

Porwi 

PmeTi 

1 

Z  0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

RO 

Ro 

0 

Gq 

CIq 

PfullQ 

PoUq 

0 

^0 

where  Zq  =  \ Nrec/H 0  1.  Values  for  Hq,  Gq,  flo*  P/uIIq,  and  Pouq  can  be  obtedned 
using  equation  (3.2)  and  applying  definitions  in  Tables  3.1  and  3.2. 


INTERNAL(Nrec): 

L _ Ck _ Maxk  Rindex  Split  Ascend  A  Ao  S  So  N 

0  relative _ location  0  0  0  00000  Nrec 

level  i  Rj  Roj  Hi  Gj  n<  PfulU  PoiLj  PmeTj  Zj 

0  Nrec  1  0  Nrec  0  0  0  0  0  1 


INDEXED_SEQUENTIAL(R0.Ro.Rl.H0.Nrec.S.So.A.Ao): 


L 

Ck 

Maxk 

Rindex 

Split 

Ascend 

A 

Ao 

S 

So 

N 

L 

logicaL-valued 

0 

0 

0 

1 

A 

Ao 

S 

So 

Nrec 

level  i 

Roi 

Mvi 

Hi 

PfulU 

PoUi 

Pmevi 

L 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

l^i^L-1 

R1 

1 

0 

Zi-x/Z, 

0 

0 

0 

0 

0 

0 

RO 

Ro 

0 

Nrec/Zo 

0 

0 

P/uUq 

0 

0 

^0 

where  = 


\  Nrec /M  0  1 

(Rl)* 


,  L  =  log(^  i)(Z o) 


+  1,  and 


Pfull^  = 


1  ifH0=R0 


0  otherwise 
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B+TREE(RO.Rl,HO,Hl,Nr«c,S.A): 


L _ Ck _ Maxk  Rindex  Split  Ascend  A  Ao  S  So  N 

L  logical _ valued  0  0  1  lAOSO  Nrec 


level  i 

Roi 

Mr, 

m 

Oi 

PfulU 

PoUi 

Pmer, 

Z{ 

L 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

L-1 

R1 

1 

1 

^L-Z 

0 

0 

PfuUi^i 

PoUL~l 

0 

1 

R1 

1 

F(R  1+1)/  2  ] 

Zi-i/Zi 

0 

0 

Pfulli 

PoUi 

PmeTi 

0 

RO 

1 

f  (RO+lV  2  1 

Nrec/Zo 

0 

0 

Pfullo 

Pouq 

PmerQ 

where 


\Nrec/H  0  ] 

{HlY 


+  1, 


Poiii 


1  if  i>0  and  HI  =Mri  or 
i=0  and  HO^Mtq 
0  otherwise 


and 


PfnlU 


1  if  i>0  and  HI  1  or 

i=0  and  HO=RO 
0  otherwise 


Pmeri 


1  i/i>0  and  1  +  1-Afrj  or 

i=0  and  HO^R  0+1- Mr  0 
0  otherwise 


The  following  parameters  are  used  in  defining  macros  for  linkset  descrip 

tors: 


P  -  parent  file 
C  -  child  file 

W  -  average  number  of  child  records  per  linkset  occurrence 


Cc  -  number  of  nodes  per  cell 


Macros  for  inverted  and  cellular  serial  linksets  are: 
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INVERTED(P.C.W): 


Parent  Child 

File _ File  SI  Cs  Cc  Pp  Ko  R1  D1  Ic  W 

P  C  OOOOOOOOW 

CELLULAR_SERIAL(P,  C.  W): 

Parent  Child 

File  File  SI  Cs  Cc  Pp  Ko  R1  D1  Ic  W 

P  C  OllOOOOOW 

For  a  linkset  L,  let  L  be  its  descriptor  and  Child  be  the  descriptor  of  its 
child  file.  Also,  let  Exact=l  if  each  cell  directory  of  L  identifies  precisely 
records;  Exact=0  otherwise.  A  macro  for  a  cell  directory  descriptor  of  L  is; 

CELL_DIRECTORY(L. Exact) 

f_ ef_ k_ w 

lyay^(Chiid)  (i.Exact)+Exactxjr^yA^<*^‘‘>  Avew^^^ 
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APPENDIX  IV.  COST  EXPRESSIONS  FOR  A  STRUCTURE 


AND  INDEX  SELECTION  OBJECTIVE  FUNCTION 


Let  be  the  configuration  descriptor  of  an  inverted  file  (see  Table  6.11). 
Let  D  be  the  data  file  and  D  its  descriptor.  *  In  order  to  define  D,  we  introduce 
the  following  pareuneters  and  define  their  values  in  terms  of  the  parameters  of 
Table  6.10: 


Idj  =  is  8U1  identifier  (=1)  or  not  (=0) 

1  if  Nrec—Vj 

0  otherwise 


RO  =  record  capacity  of  the  data  file’s  primary  block,  level  0 


{blockl9n--pirl9n)y{daiat9n  + 

/■I 

Ro  ■  record  capacity  of  the  data  file’s  overflow  block,  level  0 

k 

blocklen/'(dataleTi  +  ^  lengthy  ^ ptrlen) 

R  ly  B  record  capacity  of  the  data  file's  cluster  Index  block,  with  Ay  as 
the  cluster  key 

=  ^blockLeny{lengthf  +  (l-Idf)xmdlen  +  ptrlen)  j 


From  the  descriptor  macros  of  Appendix  III,  we  have: 


D 


INDEXED^SEQUENTIAL  {R  O.Ro.R  Ic.R  O.Nrec,  S.  So.A.Ao )  d^indexed  seq 
HASH (R  0,Ro,.  BxRo,Nrec,S,So,A,Ao  )  if  d= hash  based 

UNORDERED  (R  0. Nrec,  S.A  )  if  d=unordered 


^  More  acciirately,  the  descriptor  of  D  is  a  function  of  For  notational  simplicity,  howev¬ 
er,  we  abbreviate  D  (“d)  by  D .  This  abbreviation  applies  to  other  descriptors  introduced 
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Let  Ij  be  the  index  file  for  attribute  Aj  and  let  Lj  be  the  linkset  that  connects  Ij 
to  D.  The  descriptor  of  Lj  is  given  by: 


CELLULAR — SERIAL  (I j,D,NTec /^V j)  if  (i=iTidexed  seg 
INVERTED  {I j,D,NreCy/Vj)  otherwise 


The  descriptor  CD  ^  of  a  cell  directory  of  Lj  is: 

CD^  =  CELL^IRECTORY{Lj.Idj) 

In  order  to  define  the  descriptor  of  Ij,  we  introduce  the  following: 


reclenj  =  length  of  attribute  field  +  length  of  cell  directory 
=  length j  +  PTR{Ij,  Lj)y.ptrlen 

nvj  =  number  of  records  containing  anj4^  field  and  cell  directory 


=  j  reclenj /inaxreclen 


rOj  =  record  capacity  of  a  primary  block  of  Ij,  level  0 

=  I  {blocklen-~ptrlen)/min{Tnaxrecl0n,reclenj) 
rlj  =  record  capacity  of  a  cluster  index  block  of  Ij 
=  I blocklen /{length j  +  ptrlen)  | 

The  descriptor  of  Ij  is: 

Ij  =  5+7’/?£'jE'(r0j,r Ij,  .SxrOj,  .9xr lj,VjXnrj,S',i4) 


The  storage  cost  of  an  inverted  file  is: 


STORAGE {-O)  -  STORiD) ^Indexjy.STOR{lj) 

j=i 


In  the  following  models  of  transactions  GET,  CHANGE,  ADD,  and  SUB,  List 

denotes  the  name  of  the  cell  directory  field  of  an  index  record.  ^  To  model  GET, 
later  in  this  appendix. 

^  Actually,  the  contents  of  a  List  field  -  a  cell  directory  -  may  extend  over  several  records. 

This  occurs  in  index  file  Ij  when  nrj>l. 
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we  first  present  a  transaction  G£!TRCC(9t/F/?K4,HOLD^phrase)  which  outputs 
records  that  satisfy  QUERY^. 


TRANSACTTON  ^SSZBE,Z{qUERYi,H0LD^phras9) 

I  Let  61,  St  he  the  indices  of  (Ag  =  valuer)  clauses  of 
QUERYi  for  which  index /<  exista. 
if  t  >  0  then 

(  comment  •••  retrieve  and  intersect  inverted  lists  •••; 
cell_dir  :=  RETRIEVE  (list)  WHERE  =  valiLCi^: 
for  i  :=  2  to  t  do 

I  nert_cd  ;=  RETRIEVE  (list)  WHERE  Ag^  =  value g^\ 

ceU_dir  :=  intersect  next_cd  with  cell-dlr; 

i: 

FOR  rec  :=  RETRIEVE -CHILD  cell_dir  WHERE  QUERYi  VIA  Li  HOLD_phrase  DO 

\  0UTPU7(rec):  j; 

else 

\  comment  inverted  lists  not  used  •••: 

FOR  rec  :=  RETRIEVE  D  WHERE  QUERYi  HOID -phrase  DO 
\  OUTPUT(rec);  J; 

i: 


A  GET(Qt/£’i?r-i,sortkey,HOLD_phraLse),  or  GETi,  is  reedized  by  either  copying 
the  output  of  GETREC(QC/J?i?>'t,HOLD_phrase)  or  piping  its  output  into  a  SORT 
and  returning  the  SORT  output.  A  SORT  is  used  in  the  case  that  GETREC  does  not 
return  records  in  the  desired  order.  GETREC  returns  records  in  cluster  key 
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order  whenever  inverted  lists  are  not  used  in  retrieving  records  and  the  data  file 
is  an  indexed  sequential  file.  The  GET  transaction  is: 

TRANSACTION  GET(Q  UER  Yi  .sortkey,HOLD_phrase) 

J  Let  t  be  Uie  number  of  (i4  a  =  value  g)  clauses  of 
QUERYi  for  wbich  index/a  exists. 

if  ((t=0  and  sortkey  and  not  (sortkey  =  cluster  key  of  D  and  D  is  indexed  sequential)) 
or  (t>0  and  sortkey  ^  *'"))  then 

\  FOR  X  ;=  SORT  <  GKTRECiQUERYi.  )  >  OVER  sortkey  DO  \  OUTPUT(x):  j:  ] 

else 

I  FOR  X  t«  QVnaCiQUEBYtMOLD^^roae)  DO  |  OUTPVT(x):  h  }; 

h 


CbaraoterlBtio  funotlons  for  GETREC  and  GET  use  the  following: 

Litft.itatdi  ■  QtlERYi  1b  prooessed  by  Inverted  liatB  (»!)  or  not  (sO) 

If  yiIndexiXECit  >  0 
1  i 

,  0  otherwise 

Sort_smtput^  =  a  SORT  is  used  (=1)  or  not  (=0)  in  a  GETi 

1  if  (Z/isfs_iisedt=l  and  sorted^>G)  or  {Lists_usedLi—0  and 
=  '  sorfe(ii> 0  and  7iof  {sortedi=c  d— indexed  seq)) 

0  otherwise 
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For  GETREC: 


${GETREC{QUERYi.H0LD^rase)){:9)  = 


RETC{l.x,  CQt)  + 

k 

^  ECj^y'lTidBXfyRET (I O^fEy) 
/=! 


If  Lists^usedi- 1 

otherurise 


RjETCD.QDi) 

n{GETREC{QUERYi.H0LD_phrase)){-6)  =  7i/?£’7’(D.  QDJ  =  tiR^TCCL,,  CQ<) 


Note  that  the  number  of  records  that  satisfy  QUERYi  Independent  of  the 
method  used  to  retrieve  the  records.  The  descriptors  QD^,  0N£«,  and  CQ^  will  be 
defined  shortly. 

For  GET: 

S{GETi)W  =  $  (GETREC  (QUERYi.HOLD_phrase)){‘&) 


+ 


SORT(li,  ALL.  if  5’orf_oufpuf<=l 

0  other-wise 


n(GETi)('&)  =  n(GETREC(QUERYi))(-6) 


where  is  the  descriptor  of  an  internal  file: 

li  =  INTERNAL  {n(GETREC(QUERYi.HGLD^phrase))(‘6)) 


Descriptors  for  CQi,  QDt,  ONEy,  and  ALL  are  defined  in  the  following  way: 


l7idices_iLsedi  =  set  of  indices  used  to  process  QUERYi 
=  I  j  I  ECij=l  and  Index j=l  i 


(CQi) 

9 


VJ 


AND  CDy 

y  ^Indices^used^ 


=  linkset  scan 


Let: 


Fij  if  ECij  =  l  Sind  Idj  =  1 
1  othervuise 


KFi 


if  d=indexed  sequential 
Maxk^^  if  d=hash  based  and  ECfc^l 
1  otherwise 


then, 


query 


descriptor 

f 

ef 

kf 

Ss 

ALL 

1 

1 

1 

scan 

ONE; 

1/Vf 

range  search 

QDi 

KFi 

Ss 

i  J 


T^e  following  table,  based  on  Table  2.3,  was  used  to  assign  a  ralue  to  Ss 


Wi). 

k 


QUERYi  contains  a 
clause  of  the  form: 

C  or  re  sp  onding 
Condition 

(cluster  key  =  value) 

ECii;  =  1  and  Id^.  =  1 

cluster  key  search 

(value  i^clzLster  key  lvalue  2) 

KFi<l 

range  search 

(  identifier  =  value  ) 

UEFii<l 

S 

partial  scan 

otherwise 

- 

scan 

Models  and  cost  equations  for  the  ADD,  SUB,  and  CHANGE  transactions  are: 
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TRANSACTION  ADD(rec) 


j  r  :=  INSERT  rec  INTO  D  HOLD: 
for  each  index  file  Ij  do 

\  index_rec  :=  RETR1E\^  Ij  WHERE  =  rec[i4^-]  HOLD; 

LINK  r  TO_PARENT  lndex_rec  VIA  Ly. 

UPDAIS  Indez^rec  IN  Ij\ 

UPDATE  r  IN  D; 

j: 

S(ADD)W  = 

INS{D)  +  j]IndeZjX[RET{lf,  ONE.)  +  UNK{hf)  ^  UPD{\f)  ]  ^  UPD{Ti) 

/-I 

n{ADD)W  =  0 

TRANSACTION  SUB  (rec) 

(  for  each  index  file  Ij  do 

}  lndex_rec  :=  RETRIEVE  Ij  WHERE  Aj  —  T2c\Aj^  HOLD; 

UNLINK  reo  FROM-PARENT  Index-rec  VIA  L/i 
UPDATE  Index-reo  IN  //; 

V 

REMOVE  rec  FROM  D; 

${SUB){-d)  -  j]l7ideXj^[RET{lj.0m.j)  +  UNLK{hj)  +  UPD{lj)  ]  +  REM{Ti) 

;=i 

7i{SUB)W  =  0 
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TRANSACTION  CHANGE(rec.  Aj-  ,  newvadue) 

J  if  Aj  is  the  cluster  key  of  D  then 
}  SUB(rec): 

rec[i4^]  ;=  newvalue; 

ADD(rec): 

else 

I  if  Ij  exists  then 

{  oldlist  :=  RETRIEVE  Ij  where =  rec\Aj'\  HOLD; 
newUst  :=  RETRIEVE  /^- where  =  newvalue  HOLD; 
UNLINK  rec  FROM_PARENT  oldlist  VIA  Lj\ 

LINK  rec  TO_PARENT  newlist  VTA  Lj\ 

UPDATE  oldlist  IN  Ij-, 

UPDATE  newUst  IN  Ij] 

rec[^y]  ;«  iMinralu*; 

UPDATE  rec  IN  D; 

i: 


${CHANGEj){'d) 


${SUB)W  +  ${AnD){'6) 

Indexjx[  2xRET{lj.  ONE^)  +  UNLK{Lj) 
LINK(hj)  +  2xUPD{lj)  ]  +  UPD(D) 


71  {CHANGE  j){-d)  =  0 


if  >  =  c 
otherwise 
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